Re: [Rd] delayedAssign
On Wed, 26 Sep 2007, Gabor Grothendieck wrote: I thought that perhaps the behavior in the previous post, while inconsistent with the documentation, was not all that harmful but I think its related to the following which is a potentially serious bug. The previous discussion already established that as.list of an environment should not return a list with promises in as promises should not be visible at the R level. (Another loophole that needs closing is $ for environments). So behavior of results that should not exist is undefined and I cannot see how any such behavior is a further bug, serious or otherwise. z is a list with a single numeric component, as the dput output verifies, Except it isn't, as print or str verify, which might be a problem if z was an input these functions should expect, but it isn't. yet we cannot compare its first element to 7 without getting an error message. Later on we see that its because it thinks that z[[1]] is of type promise As z[[1]] is in fact of type promise that would seem a fairly reasonable thing to think at this point ... and even force(z[[1]]) is of type promise. which is consistent with what force is documented to do. The documentation is quite explicit that force does not do what you seem to be expecting. That documentation is from a time when delay() existed to produce promises at the R level, which was a nightmare because of all the peculiarities it introduced, which is why it was removed. force is intended for one thing only -- replacing code like this: # I know the following line look really stupid and you will be # tempted to remove it for efficiency but DO NOT: it is needed # to make sure that the formal argument y is evaluated at this # point. y - y with this: force(y) which seems much clearer -- at least it suggest you look at the help page for force to see what it does. At this point promises should only ever exist in bindings in environments. If we wanted lazy evaluation constructs more widely there are really only two sensible options: The Scheme option where a special function delay creates a deferred evaluation and another, called force in Scheme, forces the evaluation but there is no implicit forcing or The Haskell option where data structurs are created lazily so z - list(f(x)) would create a list with a deferred evaluation, but any attempt to access the value of z would force the evaluation. So printing z, for example, would force the evaluation but y - z[[1]] would not. It is easy enough to create a Delay/Force pair that behaves like Scheme's with the tools available in R if that is what you want. Haskell, and other fully lazy functional languages, are very interesting but very different animals from R. For some reason you seem to be expecting some combination of Scheme and Haskell behavior. Best, luke f - function(x) environment() z - as.list(f(7)) dput(z) structure(list(x = 7), .Names = x) z[[1]] == 7 Error in z[[1]] == 7 : comparison (1) is possible only for atomic and list types force(z[[1]]) == 7 Error in force(z[[1]]) == 7 : comparison (1) is possible only for atomic and list types typeof(z) [1] list typeof(z[[1]]) [1] promise typeof(force(z[[1]])) [1] promise R.version.string # Vista [1] R version 2.6.0 beta (2007-09-23 r42958) On 9/19/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The last two lines of example(delayedAssign) give this: e - (function(x, y = 1, z) environment())(1+2, y, {cat( HO! ); pi+2}) (le - as.list(e)) # evaluates the promises $x promise: 0x032b31f8 $y promise: 0x032b3230 $z promise: 0x032b3268 which contrary to the comment appears unevaluated. Is the comment wrong or is it supposed to return an evaluated result but doesn't? R.version.string # Vista [1] R version 2.6.0 alpha (2007-09-06 r42791) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: [EMAIL PROTECTED] Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] delayedAssign
Thanks for the explanation. For lists either: (a) promises should be evaluated as they enter the list or (b) promises evaluated as they exit the list (i.e. as they are compared, inspected, etc.). I gather the intent was (a) but it does not happen that way due to a bug in R. Originally I thought (b) would then occur but my surprise was that it does not occur either which is why I feel its more serious than I had originally thought. I think its ok if promises only exist in environments and not lists. Items that would be on my wishlist would be to be able to do at R level the two mentioned previously https://stat.ethz.ch/pipermail/r-devel/2007-September/046943.html and thirdly an ability to get the evaluation environment, not just the expression, associated with a promise -- substitute only gets the expression. Originally I thought I would need some or all of these wish items and then thought not but am back to the original situation again as I use them more and realize that they are at least important for debugging (its very difficult to debug situations involving promises as there is no way to inspect the evaluation environment so you are never sure which environment a given promise is evaluating in) and possibly for writing programs as well. On 9/27/07, Luke Tierney [EMAIL PROTECTED] wrote: On Wed, 26 Sep 2007, Gabor Grothendieck wrote: I thought that perhaps the behavior in the previous post, while inconsistent with the documentation, was not all that harmful but I think its related to the following which is a potentially serious bug. The previous discussion already established that as.list of an environment should not return a list with promises in as promises should not be visible at the R level. (Another loophole that needs closing is $ for environments). So behavior of results that should not exist is undefined and I cannot see how any such behavior is a further bug, serious or otherwise. z is a list with a single numeric component, as the dput output verifies, Except it isn't, as print or str verify, which might be a problem if z was an input these functions should expect, but it isn't. yet we cannot compare its first element to 7 without getting an error message. Later on we see that its because it thinks that z[[1]] is of type promise As z[[1]] is in fact of type promise that would seem a fairly reasonable thing to think at this point ... and even force(z[[1]]) is of type promise. which is consistent with what force is documented to do. The documentation is quite explicit that force does not do what you seem to be expecting. That documentation is from a time when delay() existed to produce promises at the R level, which was a nightmare because of all the peculiarities it introduced, which is why it was removed. force is intended for one thing only -- replacing code like this: # I know the following line look really stupid and you will be # tempted to remove it for efficiency but DO NOT: it is needed # to make sure that the formal argument y is evaluated at this # point. y - y with this: force(y) which seems much clearer -- at least it suggest you look at the help page for force to see what it does. At this point promises should only ever exist in bindings in environments. If we wanted lazy evaluation constructs more widely there are really only two sensible options: The Scheme option where a special function delay creates a deferred evaluation and another, called force in Scheme, forces the evaluation but there is no implicit forcing or The Haskell option where data structurs are created lazily so z - list(f(x)) would create a list with a deferred evaluation, but any attempt to access the value of z would force the evaluation. So printing z, for example, would force the evaluation but y - z[[1]] would not. It is easy enough to create a Delay/Force pair that behaves like Scheme's with the tools available in R if that is what you want. Haskell, and other fully lazy functional languages, are very interesting but very different animals from R. For some reason you seem to be expecting some combination of Scheme and Haskell behavior. Best, luke f - function(x) environment() z - as.list(f(7)) dput(z) structure(list(x = 7), .Names = x) z[[1]] == 7 Error in z[[1]] == 7 : comparison (1) is possible only for atomic and list types force(z[[1]]) == 7 Error in force(z[[1]]) == 7 : comparison (1) is possible only for atomic and list types typeof(z) [1] list typeof(z[[1]]) [1] promise typeof(force(z[[1]])) [1] promise R.version.string # Vista [1] R version 2.6.0 beta (2007-09-23 r42958) On 9/19/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: The last two lines of example(delayedAssign) give this: e - (function(x, y = 1, z) environment())(1+2, y, {cat( HO! ); pi+2}) (le -
Re: [Rd] rJava and RJDBC
Joe, which version of R and RJDBC are you using? The behavior you describe should have been fixed in RJDBC 0.1-4. Please try the latest version from rforge install.packages(RJDBC,,http://rforge.net/;) and please let me know if that solves your problem. Cheers, Simon On Sep 26, 2007, at 10:03 PM, Joe W. Byers wrote: I am desperate for help. I am trying to get the RJDBC and rJava .5to work on both my windows xp and linux Redhat EL5 Server. On both I get a ava.lang.ClassNotFoundException when calling JDBC(). My example is require(RJDBC) classPath='C:\\libraries\\mysql-connector-java-5.1.3-rc\\mysql- connector-java-5.1.3-rc-bin.jar' driverClass=c(com.mysql.jdbc.Driver) drv - JDBC(c(com.mysql.jdbc.Driver),classPath,`) This returns a NULL value and a java exception. .jgetEx() [1] Java-Object{java.lang.ClassNotFoundException: com.mysql.jdbc.Driver} my java version is .jcall('java.lang.System','S','getProperty','java.version') [1] 1.6.0_02 jre When I use java 1.5.0_11 jre I have the same problem but the .jgetEx() is .jgetEx() [1] Java-Object{} my class path is .jclassPath() [1] C:\\PROGRA~1\\R\\library\\rJava\\java [2] . [3] C:\\libraries\\mysql-connector-java-5.1.3-rc\\mysql-connector- java-5.1.3-rc-bin.jar [4] C:\\libraries\\xmlbeans-2.0.0-beta1\\lib\\xbean.jar [5] C:\\libraries\\POI\\poi-2.5.1-final-20040804.jar [6] C:\\libraries\\POI\\poi-contrib-2.5.1-final-20040804.jar [7] C:\\libraries\\POI\\poi-scratchpad-2.5.1-final-20040804.jar [8] C:\\Libraries\\PJM\\eDataFeed.jar [9] C:\\Libraries\\PJM\\webserviceclient.jar [10] C:\\Java\\Libraries\\QTJava.zip My java_Home is .jcall('java.lang.System','S','getProperty','java.home') [1] C:\\Java\\jre1.6.0_02 I have tried breaking down the JDBC as .jinit() or .jinit(classPath) v-.jcall(java/lang/ClassLoader,Ljava/lang/ClassLoader;, getSystemClassLoader) .jcall(java/lang/Class, Ljava/lang/Class;, forName, as.character(driverClass)[1], TRUE, v) to no avail. I have tried different versions of the mysql jar. I do not know if my java version not compatible, my java settings are wrong, or I am just blind to the problem. This is the same for both my Windows XP and Redhat EL5 Server. I really appreciate any and all assistance. Thank you Joe __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] modifying large R objects in place
On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote: For the most part, doing anything to an R object result in it's duplication. You generally have to do a lot of work to NOT copy an R object. Thank you for your response. Unfortunately, you are right. For example, the allocated memory determined by top command on Linux may change during a session as follows: a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m a[1,1] - 0 # 3.0g gc() # 1.5g In the current applicatin, I modify the matrix only using my own C code and only read it on R level. So, the above is not a big problem for me (at least not now). However, there is a related thing, which could be a bug. The following code determines the value of NAMED field in SEXP header of an object: SEXP getnamed(SEXP a) { SEXP out; PROTECT(out = allocVector(INTSXP, 1)); INTEGER(out)[0] = NAMED(a); UNPROTECT(1); return(out); } Now, consider the following session u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) length(u) .Call(getnamed,u) # 1 (OK) dim(u) .Call(getnamed,u) # 1 (OK) nrow(u) .Call(getnamed,u) # 2 (why?) u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) ncol(u) .Call(getnamed,u) # 2 (so, ncol does the same) Is this a bug? Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] modifying large R objects in place
In my previous email, I sent the example: a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m a[1,1] - 0 # 3.0g gc() # 1.5g This is misleading. The correct version is a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m a[1,1] - as.integer(0) # 1.5g gc() # 774m So, the object duplicates, but nothing more. The main part of my previous email (question concerning a possible bug in the behavior of nrow(a) and ncol(a)) remains open. Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] delayedAssign
On Thu, 27 Sep 2007, Gabor Grothendieck wrote: Thanks for the explanation. For lists either: (a) promises should be evaluated as they enter the list or (b) promises evaluated as they exit the list (i.e. as they are compared, inspected, etc.). What makes you conclude that this is what should happen? Again, promises are internal. We could, and maybe will, eliminate promises in favor of a mark on bindings in environments that indicates that they need to be evaluated. At the R level this woud produce the same behavior as we currently (intend to) have. If we allowed lazy structures outside of bindings then I still don't see how (b) should happen. With Scheme-like semantics we would definitely NOT want this to happen; with Haskell-like semantics any attempt to look at the value (including priting) would result in evaluation (and replacing the promise/thunk/whatever by its value). I gather the intent was (a) but it does not happen that way due to a bug in R. Originally I thought (b) would then occur but my surprise was that it does not occur either which is why I feel its more serious than I had originally thought. I think its ok if promises only exist in environments and not lists. Items that would be on my wishlist would be to be able to do at R level the two mentioned previously https://stat.ethz.ch/pipermail/r-devel/2007-September/046943.html I am still not persuaded that tools for inspecting environments are worth the time and effort required but I am prepared to be. Best, luke and thirdly an ability to get the evaluation environment, not just the expression, associated with a promise -- substitute only gets the expression. Originally I thought I would need some or all of these wish items and then thought not but am back to the original situation again as I use them more and realize that they are at least important for debugging (its very difficult to debug situations involving promises as there is no way to inspect the evaluation environment so you are never sure which environment a given promise is evaluating in) and possibly for writing programs as well. On 9/27/07, Luke Tierney [EMAIL PROTECTED] wrote: On Wed, 26 Sep 2007, Gabor Grothendieck wrote: I thought that perhaps the behavior in the previous post, while inconsistent with the documentation, was not all that harmful but I think its related to the following which is a potentially serious bug. The previous discussion already established that as.list of an environment should not return a list with promises in as promises should not be visible at the R level. (Another loophole that needs closing is $ for environments). So behavior of results that should not exist is undefined and I cannot see how any such behavior is a further bug, serious or otherwise. z is a list with a single numeric component, as the dput output verifies, Except it isn't, as print or str verify, which might be a problem if z was an input these functions should expect, but it isn't. yet we cannot compare its first element to 7 without getting an error message. Later on we see that its because it thinks that z[[1]] is of type promise As z[[1]] is in fact of type promise that would seem a fairly reasonable thing to think at this point ... and even force(z[[1]]) is of type promise. which is consistent with what force is documented to do. The documentation is quite explicit that force does not do what you seem to be expecting. That documentation is from a time when delay() existed to produce promises at the R level, which was a nightmare because of all the peculiarities it introduced, which is why it was removed. force is intended for one thing only -- replacing code like this: # I know the following line look really stupid and you will be # tempted to remove it for efficiency but DO NOT: it is needed # to make sure that the formal argument y is evaluated at this # point. y - y with this: force(y) which seems much clearer -- at least it suggest you look at the help page for force to see what it does. At this point promises should only ever exist in bindings in environments. If we wanted lazy evaluation constructs more widely there are really only two sensible options: The Scheme option where a special function delay creates a deferred evaluation and another, called force in Scheme, forces the evaluation but there is no implicit forcing or The Haskell option where data structurs are created lazily so z - list(f(x)) would create a list with a deferred evaluation, but any attempt to access the value of z would force the evaluation. So printing z, for example, would force the evaluation but y - z[[1]] would not. It is easy enough to create a Delay/Force pair that behaves like Scheme's with the tools available in R if that is what you want. Haskell, and other fully lazy functional
Re: [Rd] rJava and RJDBC
Simon Urbanek simon.urbanek at r-project.org writes: Joe, which version of R and RJDBC are you using? The behavior you describe should have been fixed in RJDBC 0.1-4. Please try the latest version from rforge install.packages(RJDBC,,http://rforge.net/;) and please let me know if that solves your problem. Cheers, Simon Simon, Thank you so much. I have been working on this for a week. I also have not been using rforge.net as a repository only the defaults to get my package update. Usually the IL mirror. This really rocks!. Agains, Thank you and have a wonderful day. Joe __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] modifying large R objects in place
Petr Savicky wrote: On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote: For the most part, doing anything to an R object result in it's duplication. You generally have to do a lot of work to NOT copy an R object. Thank you for your response. Unfortunately, you are right. For example, the allocated memory determined by top command on Linux may change during a session as follows: a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m a[1,1] - 0 # 3.0g gc() # 1.5g In the current applicatin, I modify the matrix only using my own C code and only read it on R level. So, the above is not a big problem for me (at least not now). However, there is a related thing, which could be a bug. The following code determines the value of NAMED field in SEXP header of an object: SEXP getnamed(SEXP a) { SEXP out; PROTECT(out = allocVector(INTSXP, 1)); INTEGER(out)[0] = NAMED(a); UNPROTECT(1); return(out); } Now, consider the following session u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) length(u) .Call(getnamed,u) # 1 (OK) dim(u) .Call(getnamed,u) # 1 (OK) nrow(u) .Call(getnamed,u) # 2 (why?) u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) ncol(u) .Call(getnamed,u) # 2 (so, ncol does the same) Is this a bug? No. It is an infelicity. The issues are that 1. length() and dim() call .Primitive directly, whereas nrow() and ncol() are real R functions 2. NAMED records whether an object has _ever_ had 0, 1, or 2+ names During the evaluation of ncol(u). the argument x is evaluated, and at that point the object u is also named x in the evaluation frame of ncol(). A full(er) reference counting system might drop NAMED back to 1 when exiting ncol(), but currently, R can only count up (and trying to find the conditions under which it is safe to reduce NAMED will make your head spin, believe me! ) Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] modifying large R objects in place
1) You implicitly coerced 'a' to be numeric and thereby (almost) doubled its size: did you intend to? Does that explain your confusion? 2) I expected NAMED on 'a' to be incremented by nrow(a): here is my understanding. When you called nrow(a) you created another reference to 'a' in the evaluation frame of nrow. (At a finer level you first created a promise to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(SEXP) = 2.) So NAMED(a) was correctly bumped to 2, and it is never reduced. More generally, any argument to a closure that actually gets used will get NAMED set to 2. Having too high a value of NAMED could never be a 'bug'. See the explanation in the R Internals manual: When an object is about to be altered, the named field is consulted. A value of 2 means that the object must be duplicated before being changed. (Note that this does not say that it is necessary to duplicate, only that it should be duplicated whether necessary or not.) 3) Memory profiling can be helpful in telling you exactly what copies get made. On Thu, 27 Sep 2007, Petr Savicky wrote: On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote: For the most part, doing anything to an R object result in it's duplication. You generally have to do a lot of work to NOT copy an R object. Thank you for your response. Unfortunately, you are right. For example, the allocated memory determined by top command on Linux may change during a session as follows: a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m a[1,1] - 0 # 3.0g gc() # 1.5g In the current applicatin, I modify the matrix only using my own C code and only read it on R level. So, the above is not a big problem for me (at least not now). However, there is a related thing, which could be a bug. The following code determines the value of NAMED field in SEXP header of an object: SEXP getnamed(SEXP a) { SEXP out; PROTECT(out = allocVector(INTSXP, 1)); INTEGER(out)[0] = NAMED(a); UNPROTECT(1); return(out); } Now, consider the following session u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) length(u) .Call(getnamed,u) # 1 (OK) dim(u) .Call(getnamed,u) # 1 (OK) nrow(u) .Call(getnamed,u) # 2 (why?) u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) ncol(u) .Call(getnamed,u) # 2 (so, ncol does the same) Is this a bug? Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Aggregate factor names
Hi all, A suggestion derived from discussions amongst a number of R users in my research group: set the default column names produced by aggregate () equal to the names of the objects in the list passed to the 'by' object. ex. it is annoying to type with( my.data ,aggregate( my.dv ,list( one.iv = one.iv ,another.iv = another.iv ,yet.another.iv = yet.another.iv ) ,some.function ) ) to yield a data frame with names = c ('one.iv','another.iv','yet.another.iv','x') when this seems more economical: with( my.data ,aggregate( my.dv ,list( one.iv ,another.iv ,yet.another.iv ) ,some.function ) ) -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Aggregate factor names
You can do this: aggregate(iris[-5], iris[5], mean) On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote: Hi all, A suggestion derived from discussions amongst a number of R users in my research group: set the default column names produced by aggregate () equal to the names of the objects in the list passed to the 'by' object. ex. it is annoying to type with( my.data ,aggregate( my.dv ,list( one.iv = one.iv ,another.iv = another.iv ,yet.another.iv = yet.another.iv ) ,some.function ) ) to yield a data frame with names = c ('one.iv','another.iv','yet.another.iv','x') when this seems more economical: with( my.data ,aggregate( my.dv ,list( one.iv ,another.iv ,yet.another.iv ) ,some.function ) ) -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Aggregate factor names
Understood, but my point is that the naming I suggest should be the default. One should not be 'punished' for being explicit in calling aggregate. On 27-Sep-07, at 1:06 PM, Gabor Grothendieck wrote: You can do this: aggregate(iris[-5], iris[5], mean) On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote: Hi all, A suggestion derived from discussions amongst a number of R users in my research group: set the default column names produced by aggregate () equal to the names of the objects in the list passed to the 'by' object. ex. it is annoying to type with( my.data ,aggregate( my.dv ,list( one.iv = one.iv ,another.iv = another.iv ,yet.another.iv = yet.another.iv ) ,some.function ) ) to yield a data frame with names = c ('one.iv','another.iv','yet.another.iv','x') when this seems more economical: with( my.data ,aggregate( my.dv ,list( one.iv ,another.iv ,yet.another.iv ) ,some.function ) ) -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Aggregate factor names
You can do this too: aggregate(iris[-5], iris[Species], mean) or this: with(iris, aggregate(iris[-5], data.frame(Species), mean)) or this: attach(iris) aggregate(iris[-5], data.frame(Species), mean) The point is that you already don't have to write x = x. The only reason you are writing it that way is that you are using list instead of data.frame. Just use data.frame or appropriate indexing as shown. On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote: Understood, but my point is that the naming I suggest should be the default. One should not be 'punished' for being explicit in calling aggregate. On 27-Sep-07, at 1:06 PM, Gabor Grothendieck wrote: You can do this: aggregate(iris[-5], iris[5], mean) On 9/27/07, Mike Lawrence [EMAIL PROTECTED] wrote: Hi all, A suggestion derived from discussions amongst a number of R users in my research group: set the default column names produced by aggregate () equal to the names of the objects in the list passed to the 'by' object. ex. it is annoying to type with( my.data ,aggregate( my.dv ,list( one.iv = one.iv ,another.iv = another.iv ,yet.another.iv = yet.another.iv ) ,some.function ) ) to yield a data frame with names = c ('one.iv','another.iv','yet.another.iv','x') when this seems more economical: with( my.data ,aggregate( my.dv ,list( one.iv ,another.iv ,yet.another.iv ) ,some.function ) ) -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Unnecessary extra copy with matrix(..., dimnames=NULL) (Was: Re: modifying large R objects in place)
As others already mentioned, in your example you are first creating an integer matrix and the coercing it to a double matrix by assigning (double) 1 to element [1,1]. However, even when correcting for this mistake, there is an extra copy created when using matrix(). Try this in a fresh vanilla R session: print(gc()) used (Mb) gc trigger (Mb) max used (Mb) Ncells 136684 3.7 35 9.4 35 9.4 Vcells 81026 0.7 786432 6.0 473127 3.7 x - matrix(1, nrow=5000, ncol=5000) print(gc()) used (Mb) gc trigger (Mb) max used (Mb) Ncells 136793 3.7 35 9.4 35 9.4 Vcells 25081043 191.4 27989266 213.6 25081056 191.4 x[1,1] - 2 print(gc()) used (Mb) gc trigger (Mb) max used (Mb) Ncells 136797 3.7 35 9.4 35 9.4 Vcells 25081044 191.4 52830254 403.1 50081058 382.1 So, yes, in that x[1,1] - 2 assignment an extra copy is created. It is related to to the fact that there is NAMED matrix object being created inside matrix(), cf. the last rows in matrix(): x - .Internal(matrix(data, nrow, ncol, byrow)) dimnames(x) - dimnames x Here is a patch for matrix() that avoids this problem *when dimnames is NULL* (which is many time the case): matrix - function(data=NA, nrow=1, ncol=1, byrow=FALSE, dimnames=NULL) { data - as.vector(data); if(missing(nrow)) { nrow - ceiling(length(data)/ncol); } else if(missing(ncol)) { ncol - ceiling(length(data)/nrow); } # Trick to avoid extra copy in the case when 'dimnames' is NULL. if (is.null(dimnames)) { .Internal(matrix(data, nrow, ncol, byrow)); } else { x - .Internal(matrix(data, nrow, ncol, byrow)); dimnames(x) - dimnames; x; } } # matrix() Try the above again in a fresh R session with this patch applied and you'll get: print(gc()) used (Mb) gc trigger (Mb) max used (Mb) Ncells 136805 3.7 35 9.4 35 9.4 Vcells 81122 0.7 786432 6.0 473127 3.7 x - matrix(1, nrow=5000, ncol=5000) print(gc()) used (Mb) gc trigger (Mb) max used (Mb) Ncells 136919 3.7 35 9.4 35 9.4 Vcells 25081139 191.4 27989372 213.6 25081152 191.4 x[1,1] - 2 print(gc()) used (Mb) gc trigger (Mb) max used (Mb) Ncells 136923 3.7 35 9.4 35 9.4 Vcells 25081140 191.4 29468840 224.9 25081276 191.4 Voila! I talked to Luke Tierney about this and he though the internal method should be updated to take the dimnames argument, i.e. .Internal(matrix(data, nrow, ncol, byrow, dimnames)). However, until that is happening, may I suggest this simple patch/workaround to go in R v2.6.0? Cheers Henrik On 9/27/07, Petr Savicky [EMAIL PROTECTED] wrote: On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote: For the most part, doing anything to an R object result in it's duplication. You generally have to do a lot of work to NOT copy an R object. Thank you for your response. Unfortunately, you are right. For example, the allocated memory determined by top command on Linux may change during a session as follows: a - matrix(as.integer(1),nrow=14100,ncol=14100) # 774m a[1,1] - 0 # 3.0g gc() # 1.5g In the current applicatin, I modify the matrix only using my own C code and only read it on R level. So, the above is not a big problem for me (at least not now). However, there is a related thing, which could be a bug. The following code determines the value of NAMED field in SEXP header of an object: SEXP getnamed(SEXP a) { SEXP out; PROTECT(out = allocVector(INTSXP, 1)); INTEGER(out)[0] = NAMED(a); UNPROTECT(1); return(out); } Now, consider the following session u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) length(u) .Call(getnamed,u) # 1 (OK) dim(u) .Call(getnamed,u) # 1 (OK) nrow(u) .Call(getnamed,u) # 2 (why?) u - matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0) .Call(getnamed,u) # 1 (OK) ncol(u) .Call(getnamed,u) # 2 (so, ncol does the same) Is this a bug? Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] modifying large R objects in place
Thank you very much for all the explanations. In particular for pointing out that nrow is not a .Primitive unlike dim, which is the reason for the difference in their behavior. (I rised the question of possible bug due to this difference, not just being unsatisfied with nrow). Also, thanks for: On Thu, Sep 27, 2007 at 05:59:05PM +0100, Prof Brian Ripley wrote: [...] 2) I expected NAMED on 'a' to be incremented by nrow(a): here is my understanding. When you called nrow(a) you created another reference to 'a' in the evaluation frame of nrow. (At a finer level you first created a promise to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(SEXP) = 2.) So NAMED(a) was correctly bumped to 2, and it is never reduced. More generally, any argument to a closure that actually gets used will get NAMED set to 2. [...] This explains a lot. I appreciate also the patch to matrix by Henrik Bengtsson, which saved me time formulating a further question just about this. I do not know, whether there is a reason to keep nrow, ncol not .Primitive, but if there is such, the problem may be solved by rewriting them as follows: nrow - function(...) dim(...)[1] ncol - function(...) dim(...)[2] At least in my environment, the new versions preserved NAMED == 1. It has a side effect that this unifies the error messages generated by too many arguments to nrow(x) and dim(x). Currently a - matrix(1:6,nrow=2) nrow(a,a) # Error in nrow(a, a) : unused argument(s) (1:6) dim(a,a) # Error: 2 arguments passed to 'dim' which requires 1 May be, also other solutions exist. Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel