Re: [Rd] delayedAssign

2007-09-27 Thread Luke Tierney
On Wed, 26 Sep 2007, Gabor Grothendieck wrote:

 I thought that perhaps the behavior in the previous post,
 while inconsistent with the documentation, was not all that
 harmful but I think its related to the following which is a potentially
 serious bug.

The previous discussion already established that as.list of an
environment should not return a list with promises in as promises
should not be visible at the R level.  (Another loophole that needs
closing is $ for environments). So behavior of results that should not
exist is undefined and I cannot see how any such behavior is a further
bug, serious or otherwise.

 z is a list with a single numeric component,
 as the dput output verifies,

Except it isn't, as print or str verify, which might be a problem if z
was an input these functions should expect, but it isn't.

 yet we cannot compare its first element
 to 7 without getting an error message.

 Later on we see that its because it thinks that z[[1]] is of type promise

As z[[1]] is in fact of type promise that would seem a fairly
reasonable thing to think at this point ...

 and even force(z[[1]]) is of type promise.

which is consistent with what force is documented to do. The
documentation is quite explicit that force does not do what you seem
to be expecting.  That documentation is from a time when delay()
existed to produce promises at the R level, which was a nightmare
because of all the peculiarities it introduced, which is why it was
removed.

force is intended for one thing only -- replacing code like this:

   # I know the following line look really stupid and you will be
   # tempted to remove it for efficiency but DO NOT: it is needed
   # to make sure that the formal argument y is evaluated at this
   # point.
   y - y

with this:

  force(y)

which seems much clearer -- at least it suggest you look at the help
page for force to see what it does.

At this point promises should only ever exist in bindings in
environments. If we wanted lazy evaluation constructs more widely
there are really only two sensible options:

 The Scheme option where a special function delay creates a deferred
 evaluation and another, called force in Scheme, forces the evaluation
 but there is no implicit forcing

or

 The Haskell option where data structurs are created lazily so

 z - list(f(x))

 would create a list with a deferred evaluation, but any attempt to
 access the value of z would force the evaluation. So printing z,
 for example, would force the evaluation but

y - z[[1]]

 would not.

It is easy enough to create a Delay/Force pair that behaves like
Scheme's with the tools available in R if that is what you want.
Haskell, and other fully lazy functional languages, are very
interesting but very different animals from R. For some reason you
seem to be expecting some combination of Scheme and Haskell behavior.

Best,

luke


 f - function(x) environment()
 z - as.list(f(7))
 dput(z)
 structure(list(x = 7), .Names = x)
 z[[1]] == 7
 Error in z[[1]] == 7 :
  comparison (1) is possible only for atomic and list types
 force(z[[1]]) == 7
 Error in force(z[[1]]) == 7 :
  comparison (1) is possible only for atomic and list types

 typeof(z)
 [1] list
 typeof(z[[1]])
 [1] promise
 typeof(force(z[[1]]))
 [1] promise
 R.version.string # Vista
 [1] R version 2.6.0 beta (2007-09-23 r42958)


 On 9/19/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 The last two lines of example(delayedAssign) give this:

 e - (function(x, y = 1, z) environment())(1+2, y, {cat( HO! ); pi+2})
 (le - as.list(e)) # evaluates the promises
 $x
 promise: 0x032b31f8
 $y
 promise: 0x032b3230
 $z
 promise: 0x032b3268

 which contrary to the comment appears unevaluated.  Is the comment
 wrong or is it supposed to return an evaluated result but doesn't?

 R.version.string # Vista
 [1] R version 2.6.0 alpha (2007-09-06 r42791)


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] delayedAssign

2007-09-27 Thread Gabor Grothendieck
Thanks for the explanation.

For lists either: (a) promises should be evaluated as they
enter the list or (b) promises evaluated as they exit the
list (i.e. as they are compared, inspected, etc.).  I gather
the intent was (a) but it does not happen that way due to
a bug in R.   Originally I thought (b) would then occur but
my surprise was that it does not occur either which is why
I feel its more serious than I had originally thought.

I think its ok if promises only exist in environments and not
lists.  Items that would be on my wishlist would be to be able
to do at R level the two mentioned previously

https://stat.ethz.ch/pipermail/r-devel/2007-September/046943.html

and thirdly an ability to get the evaluation environment, not just the
expression,
associated with a promise -- substitute only gets the expression.
Originally I thought I would need some or all of these wish items
and then thought not but am back to the original situation again as I use
them more and realize that they are at least important
for debugging (its very difficult to debug situations involving promises as
there is no way to inspect the evaluation environment so you are never sure
which environment a given promise is evaluating in) and possibly
for writing programs as well.

On 9/27/07, Luke Tierney [EMAIL PROTECTED] wrote:
 On Wed, 26 Sep 2007, Gabor Grothendieck wrote:

  I thought that perhaps the behavior in the previous post,
  while inconsistent with the documentation, was not all that
  harmful but I think its related to the following which is a potentially
  serious bug.

 The previous discussion already established that as.list of an
 environment should not return a list with promises in as promises
 should not be visible at the R level.  (Another loophole that needs
 closing is $ for environments). So behavior of results that should not
 exist is undefined and I cannot see how any such behavior is a further
 bug, serious or otherwise.

  z is a list with a single numeric component,
  as the dput output verifies,

 Except it isn't, as print or str verify, which might be a problem if z
 was an input these functions should expect, but it isn't.

  yet we cannot compare its first element
  to 7 without getting an error message.
 
  Later on we see that its because it thinks that z[[1]] is of type promise

 As z[[1]] is in fact of type promise that would seem a fairly
 reasonable thing to think at this point ...

  and even force(z[[1]]) is of type promise.

 which is consistent with what force is documented to do. The
 documentation is quite explicit that force does not do what you seem
 to be expecting.  That documentation is from a time when delay()
 existed to produce promises at the R level, which was a nightmare
 because of all the peculiarities it introduced, which is why it was
 removed.

 force is intended for one thing only -- replacing code like this:

   # I know the following line look really stupid and you will be
   # tempted to remove it for efficiency but DO NOT: it is needed
   # to make sure that the formal argument y is evaluated at this
   # point.
   y - y

 with this:

  force(y)

 which seems much clearer -- at least it suggest you look at the help
 page for force to see what it does.

 At this point promises should only ever exist in bindings in
 environments. If we wanted lazy evaluation constructs more widely
 there are really only two sensible options:

 The Scheme option where a special function delay creates a deferred
 evaluation and another, called force in Scheme, forces the evaluation
 but there is no implicit forcing

 or

 The Haskell option where data structurs are created lazily so

 z - list(f(x))

 would create a list with a deferred evaluation, but any attempt to
 access the value of z would force the evaluation. So printing z,
 for example, would force the evaluation but

y - z[[1]]

 would not.

 It is easy enough to create a Delay/Force pair that behaves like
 Scheme's with the tools available in R if that is what you want.
 Haskell, and other fully lazy functional languages, are very
 interesting but very different animals from R. For some reason you
 seem to be expecting some combination of Scheme and Haskell behavior.

 Best,

 luke

 
  f - function(x) environment()
  z - as.list(f(7))
  dput(z)
  structure(list(x = 7), .Names = x)
  z[[1]] == 7
  Error in z[[1]] == 7 :
   comparison (1) is possible only for atomic and list types
  force(z[[1]]) == 7
  Error in force(z[[1]]) == 7 :
   comparison (1) is possible only for atomic and list types
 
  typeof(z)
  [1] list
  typeof(z[[1]])
  [1] promise
  typeof(force(z[[1]]))
  [1] promise
  R.version.string # Vista
  [1] R version 2.6.0 beta (2007-09-23 r42958)
 
 
  On 9/19/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  The last two lines of example(delayedAssign) give this:
 
  e - (function(x, y = 1, z) environment())(1+2, y, {cat( HO! ); pi+2})
  (le - 

Re: [Rd] rJava and RJDBC

2007-09-27 Thread Simon Urbanek
Joe,

which version of R and RJDBC are you using? The behavior you describe  
should have been fixed in RJDBC 0.1-4. Please try the latest version  
from rforge
install.packages(RJDBC,,http://rforge.net/;)
and please let me know if that solves your problem.

Cheers,
Simon


On Sep 26, 2007, at 10:03 PM, Joe W. Byers wrote:

 I am desperate for help.

 I am trying to get the RJDBC and rJava .5to work on both my windows xp
 and linux Redhat EL5 Server.  On both I get a
 ava.lang.ClassNotFoundException when calling JDBC().

 My example is
 require(RJDBC)
 classPath='C:\\libraries\\mysql-connector-java-5.1.3-rc\\mysql- 
 connector-java-5.1.3-rc-bin.jar'
 driverClass=c(com.mysql.jdbc.Driver)
 drv - JDBC(c(com.mysql.jdbc.Driver),classPath,`)


 This returns a NULL value and a java exception.
 .jgetEx()
 [1] Java-Object{java.lang.ClassNotFoundException:  
 com.mysql.jdbc.Driver}
 my java version is
 .jcall('java.lang.System','S','getProperty','java.version')
 [1] 1.6.0_02
 jre


 When I use java 1.5.0_11 jre I have the same problem but the .jgetEx()
 is
 .jgetEx()
 [1] Java-Object{}

 my class path is
 .jclassPath()
   [1] C:\\PROGRA~1\\R\\library\\rJava\\java

   [2] .

   [3]
 C:\\libraries\\mysql-connector-java-5.1.3-rc\\mysql-connector- 
 java-5.1.3-rc-bin.jar
   [4] C:\\libraries\\xmlbeans-2.0.0-beta1\\lib\\xbean.jar

   [5] C:\\libraries\\POI\\poi-2.5.1-final-20040804.jar

   [6] C:\\libraries\\POI\\poi-contrib-2.5.1-final-20040804.jar

   [7] C:\\libraries\\POI\\poi-scratchpad-2.5.1-final-20040804.jar

   [8] C:\\Libraries\\PJM\\eDataFeed.jar

   [9] C:\\Libraries\\PJM\\webserviceclient.jar

 [10] C:\\Java\\Libraries\\QTJava.zip

 My java_Home is
 .jcall('java.lang.System','S','getProperty','java.home')
 [1] C:\\Java\\jre1.6.0_02


 I have tried breaking down the JDBC as
 .jinit() or .jinit(classPath)
 v-.jcall(java/lang/ClassLoader,Ljava/lang/ClassLoader;,
 getSystemClassLoader)
 .jcall(java/lang/Class, Ljava/lang/Class;,
  forName, as.character(driverClass)[1], TRUE, v)
   to no avail.

 I have tried different versions of the mysql jar.

 I do not know if my java version not compatible, my java settings are
 wrong, or I am just blind to the problem.  This is the same for  
 both my
 Windows XP and Redhat EL5 Server.

 I really appreciate any and all assistance.

 Thank you
 Joe

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-27 Thread Petr Savicky
On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
 For the most part, doing anything to an R object result in it's
 duplication. You generally have to do a lot of work to NOT copy an R
 object.

Thank you for your response. Unfortunately, you are right. For example,
the allocated memory determined by top command on Linux may change during
a session as follows:
  a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
  a[1,1] - 0 # 3.0g
  gc() # 1.5g

In the current applicatin, I modify the matrix only using my own C code
and only read it on R level. So, the above is not a big problem for me
(at least not now).

However, there is a related thing, which could be a bug. The following
code determines the value of NAMED field in SEXP header of an object:

  SEXP getnamed(SEXP a)
  {
  SEXP out;
  PROTECT(out = allocVector(INTSXP, 1));
  INTEGER(out)[0] = NAMED(a);
  UNPROTECT(1);
  return(out);
  }

Now, consider the following session

  u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
  .Call(getnamed,u) # 1 (OK)

  length(u)
  .Call(getnamed,u) # 1 (OK)

  dim(u)
  .Call(getnamed,u) # 1 (OK)

  nrow(u)
  .Call(getnamed,u) # 2 (why?)

  u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
  .Call(getnamed,u) # 1 (OK)
  ncol(u)
  .Call(getnamed,u) # 2 (so, ncol does the same)

Is this a bug?

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-27 Thread Petr Savicky
In my previous email, I sent the example:
   a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
   a[1,1] - 0 # 3.0g
   gc() # 1.5g

This is misleading. The correct version is
   a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
   a[1,1] - as.integer(0) # 1.5g
   gc() # 774m

So, the object duplicates, but nothing more.

The main part of my previous email (question concerning
a possible bug in the behavior of nrow(a) and ncol(a))
remains open.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] delayedAssign

2007-09-27 Thread Luke Tierney
On Thu, 27 Sep 2007, Gabor Grothendieck wrote:

 Thanks for the explanation.

 For lists either: (a) promises should be evaluated as they
 enter the list or (b) promises evaluated as they exit the
 list (i.e. as they are compared, inspected, etc.).

What makes you conclude that this is what should happen?

Again, promises are internal.  We could, and maybe will, eliminate
promises in favor of a mark on bindings in environments that indicates
that they need to be evaluated.  At the R level this woud produce the
same behavior as we currently (intend to) have.

If we allowed lazy structures outside of bindings then I still don't
see how (b) should happen.  With Scheme-like semantics we would
definitely NOT want this to happen; with Haskell-like semantics any
attempt to look at the value (including priting) would result in
evaluation (and replacing the promise/thunk/whatever by its value).

 I gather
 the intent was (a) but it does not happen that way due to
 a bug in R.   Originally I thought (b) would then occur but
 my surprise was that it does not occur either which is why
 I feel its more serious than I had originally thought.

 I think its ok if promises only exist in environments and not
 lists.  Items that would be on my wishlist would be to be able
 to do at R level the two mentioned previously

 https://stat.ethz.ch/pipermail/r-devel/2007-September/046943.html

I am still not persuaded that tools for inspecting environments are
worth the time and effort required but I am prepared to be.

Best,

luke


 and thirdly an ability to get the evaluation environment, not just the
 expression,
 associated with a promise -- substitute only gets the expression.
 Originally I thought I would need some or all of these wish items
 and then thought not but am back to the original situation again as I use
 them more and realize that they are at least important
 for debugging (its very difficult to debug situations involving promises as
 there is no way to inspect the evaluation environment so you are never sure
 which environment a given promise is evaluating in) and possibly
 for writing programs as well.

 On 9/27/07, Luke Tierney [EMAIL PROTECTED] wrote:
 On Wed, 26 Sep 2007, Gabor Grothendieck wrote:

 I thought that perhaps the behavior in the previous post,
 while inconsistent with the documentation, was not all that
 harmful but I think its related to the following which is a potentially
 serious bug.

 The previous discussion already established that as.list of an
 environment should not return a list with promises in as promises
 should not be visible at the R level.  (Another loophole that needs
 closing is $ for environments). So behavior of results that should not
 exist is undefined and I cannot see how any such behavior is a further
 bug, serious or otherwise.

 z is a list with a single numeric component,
 as the dput output verifies,

 Except it isn't, as print or str verify, which might be a problem if z
 was an input these functions should expect, but it isn't.

 yet we cannot compare its first element
 to 7 without getting an error message.

 Later on we see that its because it thinks that z[[1]] is of type promise

 As z[[1]] is in fact of type promise that would seem a fairly
 reasonable thing to think at this point ...

 and even force(z[[1]]) is of type promise.

 which is consistent with what force is documented to do. The
 documentation is quite explicit that force does not do what you seem
 to be expecting.  That documentation is from a time when delay()
 existed to produce promises at the R level, which was a nightmare
 because of all the peculiarities it introduced, which is why it was
 removed.

 force is intended for one thing only -- replacing code like this:

   # I know the following line look really stupid and you will be
   # tempted to remove it for efficiency but DO NOT: it is needed
   # to make sure that the formal argument y is evaluated at this
   # point.
   y - y

 with this:

  force(y)

 which seems much clearer -- at least it suggest you look at the help
 page for force to see what it does.

 At this point promises should only ever exist in bindings in
 environments. If we wanted lazy evaluation constructs more widely
 there are really only two sensible options:

 The Scheme option where a special function delay creates a deferred
 evaluation and another, called force in Scheme, forces the evaluation
 but there is no implicit forcing

 or

 The Haskell option where data structurs are created lazily so

 z - list(f(x))

 would create a list with a deferred evaluation, but any attempt to
 access the value of z would force the evaluation. So printing z,
 for example, would force the evaluation but

y - z[[1]]

 would not.

 It is easy enough to create a Delay/Force pair that behaves like
 Scheme's with the tools available in R if that is what you want.
 Haskell, and other fully lazy functional 

Re: [Rd] rJava and RJDBC

2007-09-27 Thread Joe W Byers
Simon Urbanek simon.urbanek at r-project.org writes:

 
 Joe,
 
 which version of R and RJDBC are you using? The behavior you describe  
 should have been fixed in RJDBC 0.1-4. Please try the latest version  
 from rforge
 install.packages(RJDBC,,http://rforge.net/;)
 and please let me know if that solves your problem.
 
 Cheers,
 Simon

Simon,

Thank you so much.  I have been working on this for a week.

I also have not been using rforge.net as a repository only the defaults to get
my package update.  Usually the IL mirror.

This really rocks!.

Agains,  Thank you and have a wonderful day.

Joe

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-27 Thread Peter Dalgaard
Petr Savicky wrote:
 On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
   
 For the most part, doing anything to an R object result in it's
 duplication. You generally have to do a lot of work to NOT copy an R
 object.
 

 Thank you for your response. Unfortunately, you are right. For example,
 the allocated memory determined by top command on Linux may change during
 a session as follows:
   a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
   a[1,1] - 0 # 3.0g
   gc() # 1.5g

 In the current applicatin, I modify the matrix only using my own C code
 and only read it on R level. So, the above is not a big problem for me
 (at least not now).

 However, there is a related thing, which could be a bug. The following
 code determines the value of NAMED field in SEXP header of an object:

   SEXP getnamed(SEXP a)
   {
   SEXP out;
   PROTECT(out = allocVector(INTSXP, 1));
   INTEGER(out)[0] = NAMED(a);
   UNPROTECT(1);
   return(out);
   }

 Now, consider the following session

   u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
   .Call(getnamed,u) # 1 (OK)

   length(u)
   .Call(getnamed,u) # 1 (OK)

   dim(u)
   .Call(getnamed,u) # 1 (OK)

   nrow(u)
   .Call(getnamed,u) # 2 (why?)

   u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
   .Call(getnamed,u) # 1 (OK)
   ncol(u)
   .Call(getnamed,u) # 2 (so, ncol does the same)

 Is this a bug?
   
No. It is an infelicity.

The issues are that
1. length() and dim() call .Primitive directly, whereas nrow() and 
ncol() are real R functions
2. NAMED records whether an object has _ever_  had  0, 1, or 2+ names

During the evaluation of ncol(u). the argument x is evaluated, and at
that point the object u is also named x in the evaluation frame of
ncol(). A full(er) reference counting system might drop NAMED back to 1
when exiting ncol(), but currently, R  can only count up (and trying to
find the conditions under which it is safe to reduce NAMED will make
your head spin, believe me! )
 Petr Savicky.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
   


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-27 Thread Prof Brian Ripley
1) You implicitly coerced 'a' to be numeric and thereby (almost) doubled 
its size: did you intend to?  Does that explain your confusion?


2) I expected NAMED on 'a' to be incremented by nrow(a): here is my 
understanding.

When you called nrow(a) you created another reference to 'a' in the 
evaluation frame of nrow.  (At a finer level you first created a promise 
to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(SEXP) 
= 2.)  So NAMED(a) was correctly bumped to 2, and it is never reduced.

More generally, any argument to a closure that actually gets used will 
get NAMED set to 2.

Having too high a value of NAMED could never be a 'bug'.  See the 
explanation in the R Internals manual:

   When an object is about to be altered, the named field is consulted. A
   value of 2 means that the object must be duplicated before being
   changed.  (Note that this does not say that it is necessary to
   duplicate, only that it should be duplicated whether necessary or not.)


3) Memory profiling can be helpful in telling you exactly what copies get 
made.



On Thu, 27 Sep 2007, Petr Savicky wrote:

 On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
 For the most part, doing anything to an R object result in it's
 duplication. You generally have to do a lot of work to NOT copy an R
 object.

 Thank you for your response. Unfortunately, you are right. For example,
 the allocated memory determined by top command on Linux may change during
 a session as follows:
  a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
  a[1,1] - 0 # 3.0g
  gc() # 1.5g

 In the current applicatin, I modify the matrix only using my own C code
 and only read it on R level. So, the above is not a big problem for me
 (at least not now).

 However, there is a related thing, which could be a bug. The following
 code determines the value of NAMED field in SEXP header of an object:

  SEXP getnamed(SEXP a)
  {
  SEXP out;
  PROTECT(out = allocVector(INTSXP, 1));
  INTEGER(out)[0] = NAMED(a);
  UNPROTECT(1);
  return(out);
  }

 Now, consider the following session

  u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
  .Call(getnamed,u) # 1 (OK)

  length(u)
  .Call(getnamed,u) # 1 (OK)

  dim(u)
  .Call(getnamed,u) # 1 (OK)

  nrow(u)
  .Call(getnamed,u) # 2 (why?)

  u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
  .Call(getnamed,u) # 1 (OK)
  ncol(u)
  .Call(getnamed,u) # 2 (so, ncol does the same)

 Is this a bug?

 Petr Savicky.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Aggregate factor names

2007-09-27 Thread Mike Lawrence
Hi all,

A suggestion derived from discussions amongst a number of R users in  
my research group: set the default column names produced by aggregate 
() equal to the names of the objects in the list passed to the 'by'  
object.

ex. it is annoying to type

with(
my.data
,aggregate(
my.dv
,list(
one.iv = one.iv
,another.iv = another.iv
,yet.another.iv = yet.another.iv
)
,some.function
)
)

to yield a data frame with names = c 
('one.iv','another.iv','yet.another.iv','x') when this seems more  
economical:

with(
my.data
,aggregate(
my.dv
,list(
one.iv
,another.iv
,yet.another.iv
)
,some.function
)
)

--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less.
- Piet Hein

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Aggregate factor names

2007-09-27 Thread Gabor Grothendieck
You can do this:

aggregate(iris[-5], iris[5], mean)


On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote:
 Hi all,

 A suggestion derived from discussions amongst a number of R users in
 my research group: set the default column names produced by aggregate
 () equal to the names of the objects in the list passed to the 'by'
 object.

 ex. it is annoying to type

 with(
my.data
,aggregate(
my.dv
,list(
one.iv = one.iv
,another.iv = another.iv
,yet.another.iv = yet.another.iv
)
,some.function
)
 )

 to yield a data frame with names = c
 ('one.iv','another.iv','yet.another.iv','x') when this seems more
 economical:

 with(
my.data
,aggregate(
my.dv
,list(
one.iv
,another.iv
,yet.another.iv
)
,some.function
)
 )

 --
 Mike Lawrence
 Graduate Student, Department of Psychology, Dalhousie University

 Website: http://memetic.ca

 Public calendar: http://icalx.com/public/informavore/Public

 The road to wisdom? Well, it's plain and simple to express:
 Err and err and err again, but less and less and less.
- Piet Hein

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Aggregate factor names

2007-09-27 Thread Mike Lawrence
Understood, but my point is that the naming I suggest should be the  
default. One should not be 'punished' for being explicit in calling  
aggregate.


On 27-Sep-07, at 1:06 PM, Gabor Grothendieck wrote:

 You can do this:

 aggregate(iris[-5], iris[5], mean)


 On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote:
 Hi all,

 A suggestion derived from discussions amongst a number of R users in
 my research group: set the default column names produced by aggregate
 () equal to the names of the objects in the list passed to the 'by'
 object.

 ex. it is annoying to type

 with(
my.data
,aggregate(
my.dv
,list(
one.iv = one.iv
,another.iv = another.iv
,yet.another.iv = yet.another.iv
)
,some.function
)
 )

 to yield a data frame with names = c
 ('one.iv','another.iv','yet.another.iv','x') when this seems more
 economical:

 with(
my.data
,aggregate(
my.dv
,list(
one.iv
,another.iv
,yet.another.iv
)
,some.function
)
 )

 --
 Mike Lawrence
 Graduate Student, Department of Psychology, Dalhousie University

 Website: http://memetic.ca

 Public calendar: http://icalx.com/public/informavore/Public

 The road to wisdom? Well, it's plain and simple to express:
 Err and err and err again, but less and less and less.
- Piet Hein

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less.
- Piet Hein

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Aggregate factor names

2007-09-27 Thread Gabor Grothendieck
You can do this too:

aggregate(iris[-5], iris[Species], mean)

or this:

with(iris, aggregate(iris[-5], data.frame(Species), mean))

or this:

attach(iris)
aggregate(iris[-5], data.frame(Species), mean)

The point is that you already don't have to write x = x.  The only
reason you are writing it that way is that you are using list instead
of data.frame.  Just use data.frame or appropriate indexing as shown.

On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote:
 Understood, but my point is that the naming I suggest should be the
 default. One should not be 'punished' for being explicit in calling
 aggregate.


 On 27-Sep-07, at 1:06 PM, Gabor Grothendieck wrote:

  You can do this:
 
  aggregate(iris[-5], iris[5], mean)
 
 
  On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote:
  Hi all,
 
  A suggestion derived from discussions amongst a number of R users in
  my research group: set the default column names produced by aggregate
  () equal to the names of the objects in the list passed to the 'by'
  object.
 
  ex. it is annoying to type
 
  with(
 my.data
 ,aggregate(
 my.dv
 ,list(
 one.iv = one.iv
 ,another.iv = another.iv
 ,yet.another.iv = yet.another.iv
 )
 ,some.function
 )
  )
 
  to yield a data frame with names = c
  ('one.iv','another.iv','yet.another.iv','x') when this seems more
  economical:
 
  with(
 my.data
 ,aggregate(
 my.dv
 ,list(
 one.iv
 ,another.iv
 ,yet.another.iv
 )
 ,some.function
 )
  )
 
  --
  Mike Lawrence
  Graduate Student, Department of Psychology, Dalhousie University
 
  Website: http://memetic.ca
 
  Public calendar: http://icalx.com/public/informavore/Public
 
  The road to wisdom? Well, it's plain and simple to express:
  Err and err and err again, but less and less and less.
 - Piet Hein
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 

 --
 Mike Lawrence
 Graduate Student, Department of Psychology, Dalhousie University

 Website: http://memetic.ca

 Public calendar: http://icalx.com/public/informavore/Public

 The road to wisdom? Well, it's plain and simple to express:
 Err and err and err again, but less and less and less.
- Piet Hein




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Unnecessary extra copy with matrix(..., dimnames=NULL) (Was: Re: modifying large R objects in place)

2007-09-27 Thread Henrik Bengtsson
As others already mentioned, in your example you are first creating an
integer matrix and the coercing it to a double matrix by assigning
(double) 1 to element [1,1].  However, even when correcting for this
mistake, there is an extra copy created when using matrix().

Try this in a fresh vanilla R session:

 print(gc())
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 136684  3.7 35  9.4   35  9.4
Vcells  81026  0.7 786432  6.0   473127  3.7
 x - matrix(1, nrow=5000, ncol=5000)
 print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136793   3.7 35   9.4   35   9.4
Vcells 25081043 191.4   27989266 213.6 25081056 191.4
 x[1,1] - 2
 print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136797   3.7 35   9.4   35   9.4
Vcells 25081044 191.4   52830254 403.1 50081058 382.1

So, yes, in that x[1,1] - 2 assignment an extra copy is created.  It
is related to to the fact that there is NAMED matrix object being
created inside matrix(), cf. the last rows in matrix():

x - .Internal(matrix(data, nrow, ncol, byrow))
dimnames(x) - dimnames
x

Here is a patch for matrix() that avoids this problem *when dimnames
is NULL* (which is many time the case):

matrix - function(data=NA, nrow=1, ncol=1, byrow=FALSE, dimnames=NULL) {
  data - as.vector(data);

  if(missing(nrow)) {
nrow - ceiling(length(data)/ncol);
  } else if(missing(ncol)) {
ncol - ceiling(length(data)/nrow);
  }

  # Trick to avoid extra copy in the case when 'dimnames' is NULL.
  if (is.null(dimnames)) {
.Internal(matrix(data, nrow, ncol, byrow));
  } else {
x - .Internal(matrix(data, nrow, ncol, byrow));
dimnames(x) - dimnames;
x;
  }
} # matrix()


Try the above again in a fresh R session with this patch applied and you'll get:

 print(gc())
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 136805  3.7 35  9.4   35  9.4
Vcells  81122  0.7 786432  6.0   473127  3.7
 x - matrix(1, nrow=5000, ncol=5000)
 print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136919   3.7 35   9.4   35   9.4
Vcells 25081139 191.4   27989372 213.6 25081152 191.4
 x[1,1] - 2
 print(gc())
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   136923   3.7 35   9.4   35   9.4
Vcells 25081140 191.4   29468840 224.9 25081276 191.4

Voila!

I talked to Luke Tierney about this and he though the internal method
should be updated to take the dimnames argument, i.e.
.Internal(matrix(data, nrow, ncol, byrow, dimnames)).  However, until
that is happening, may I suggest this simple patch/workaround to go in
R v2.6.0?

Cheers

Henrik


On 9/27/07, Petr Savicky [EMAIL PROTECTED] wrote:
 On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
  For the most part, doing anything to an R object result in it's
  duplication. You generally have to do a lot of work to NOT copy an R
  object.

 Thank you for your response. Unfortunately, you are right. For example,
 the allocated memory determined by top command on Linux may change during
 a session as follows:
   a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
   a[1,1] - 0 # 3.0g
   gc() # 1.5g

 In the current applicatin, I modify the matrix only using my own C code
 and only read it on R level. So, the above is not a big problem for me
 (at least not now).

 However, there is a related thing, which could be a bug. The following
 code determines the value of NAMED field in SEXP header of an object:

   SEXP getnamed(SEXP a)
   {
   SEXP out;
   PROTECT(out = allocVector(INTSXP, 1));
   INTEGER(out)[0] = NAMED(a);
   UNPROTECT(1);
   return(out);
   }

 Now, consider the following session

   u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
   .Call(getnamed,u) # 1 (OK)

   length(u)
   .Call(getnamed,u) # 1 (OK)

   dim(u)
   .Call(getnamed,u) # 1 (OK)

   nrow(u)
   .Call(getnamed,u) # 2 (why?)

   u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
   .Call(getnamed,u) # 1 (OK)
   ncol(u)
   .Call(getnamed,u) # 2 (so, ncol does the same)

 Is this a bug?

 Petr Savicky.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-27 Thread Petr Savicky
Thank you very much for all the explanations. In particular for pointing
out that nrow is not a .Primitive unlike dim, which is the
reason for the difference in their behavior. (I rised the question
of possible bug due to this difference, not just being unsatisfied
with nrow). Also, thanks for:

On Thu, Sep 27, 2007 at 05:59:05PM +0100, Prof Brian Ripley wrote:
[...]
 2) I expected NAMED on 'a' to be incremented by nrow(a): here is my 
 understanding.
 
 When you called nrow(a) you created another reference to 'a' in the 
 evaluation frame of nrow.  (At a finer level you first created a promise 
 to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(SEXP) 
 = 2.)  So NAMED(a) was correctly bumped to 2, and it is never reduced.
 
 More generally, any argument to a closure that actually gets used will 
 get NAMED set to 2.
[...]

This explains a lot.

I appreciate also the patch to matrix by Henrik Bengtsson, which saved
me time formulating a further question just about this.

I do not know, whether there is a reason to keep nrow, ncol not .Primitive,
but if there is such, the problem may be solved by rewriting
them as follows:

nrow - function(...) dim(...)[1]
ncol - function(...) dim(...)[2]

At least in my environment, the new versions preserved NAMED == 1.

It has a side effect that this unifies the error messages generated
by too many arguments to nrow(x) and dim(x). Currently
  a - matrix(1:6,nrow=2)
  nrow(a,a) # Error in nrow(a, a) : unused argument(s) (1:6)
  dim(a,a) # Error: 2 arguments passed to 'dim' which requires 1

May be, also other solutions exist.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel