Re: [Rd] SVN vs DVCS
On 5/26/10 4:16 AM, Gabor Grothendieck wrote: Note that one can also use any of the dvcs systems without actually moving from svn by using the dvcs (or associated extension/addon) as an svn client or by using it on an svn checkout. FWIW, I have been using git for several years now as my vsc of choice and use it for all svn-backed projects (R included) via git-svn. Some of the things I like: - Being able to organize changes in local commits that can be revised, reordered, rebased prior to publishing. Once I got in the habit of working this way, I simply can't imagine going back. - Having quick access to full repository history without network access/delay. Features for searching change history are more powerful (or easier for me to use) and I have found that useful as well. - This may not be true any longer with more recent svn servers/clients, but aside form the initial repo clone, working via git-svn was noticeably faster than straight svn client (!) -- I think related to how the tools organize the working copy and how many fstat calls they make. - I find the log reviewing functionality much better suited to reviewing changes. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Resolving functions using R's namespace mechanism can double runtime
On 4/27/10 1:16 PM, Dominick Samperi wrote: It appears that the runtime for an R script can more than double if a few references to a function foo() are replaced by more explict references of the form pkgname::foo(). The more explicit references are of course required when two loaded packages define the same function. I can understand why use of this mechanism is not free in an interpreted environment like R, but the cost seems rather high. `::` is a function, so there is going to be overhead. OTOH, there is no reason to pay for the lookup more than once. For example at startup, you could do: myfoo - pkgname::foo And then later call myfoo() and I don't think you will see the added cost. You can formalize the above approach in package code by renaming function in the importFrom directive where I believe you can do: importFrom(pkgname, myfoo=foo) + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] suggestion how to use memcpy in duplicate.c
On 4/21/10 10:45 AM, Simon Urbanek wrote: Won't that miss the last incomplete chunk? (and please don't use DATAPTR on INTSXP even though the effect is currently the same) In general it seems that the it depends on nt whether this is efficient or not since calls to short memcpy are expensive (very small nt that is). I ran some empirical tests to compare memcpy vs for() (x86_64, OS X) and the results were encouraging - depending on the size of the copied block the difference could be quite big: tiny block (ca. n = 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x faster as the size increases the gap closes (presumably due to RAM bandwidth limitations) so for n = 512M it is ~30%. Of course this is contingent on the implementation of memcpy, compiler, architecture etc. And will only matter if copying is what you do most of the time ... Copying of vectors is something that I would expect to happen fairly often in many applications of R. Is for() faster on small blocks by enough that one would want to branch based on size? + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] transient memory allocation and external pointers
On 4/20/10 6:24 AM, Melissa Jane Hubisz wrote: Thanks for the responses. Seth's example is indeed what I was trying (hoping) to do, it seems to work on my system fine (ubuntu x86_64, R 2.10.1). But if it doesn't work for him, then that definitely answers my question. I guess I'll have to go the Calloc/Free route. I expect that you could get your approach to not work on your system as well, you just have to try harder ;-) Memory related bugs can be quite tricky, because incorrect code may run fine most of the time. To trigger a problem, you need to have the right pattern of allocation such that data will be written over the memory that your invalid external pointer points to. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] transient memory allocation and external pointers
On 4/19/10 8:59 AM, Simon Urbanek wrote: On Apr 19, 2010, at 10:39 AM, Melissa Jane Hubisz wrote: Hello, The Writing R extensions manual section 6.1.1 describes the transient memory allocation function R_alloc, and states that memory allocated by R_alloc is automatically freed after the .C or .Call function is completed. However, based on my understanding of R's memory handling, as well as some test functions I have written, I suspect that this is not quite accurate. If the .Call function returns an external pointer to something created with R_alloc, then this object seems to stick around after the .Call function is completed, and is subject to garbage collection once the external pointer object is removed. Yes, because the regular rules for the lifetime of an R object apply since it is in fact an R object. It is subject to garbage collection so if you assign it anywhere its lifetime will be tied to that object (in your example EXTPTRSXP). I may be misunderstanding the question, but I think the answer is actually that it is *not* safe to put memory allocated via R_alloc into the external pointer address of an EXTPTRSXP. Here's what I think Melissa is doing: SEXP make_test_xp(SEXP s) { SEXP ans; const char *s0 = CHAR(STRING_ELT(s, 0)); char *buf = (char *)R_alloc(strlen(s0) + 1, sizeof(char)); memcpy(buf, s0, strlen(s0) + 1); ans = R_MakeExternalPtr(buf, R_NilValue, R_NilValue); return ans; } The memory allocated by R_alloc is released at the end of the .Call via vmaxset(vmax). Using R_alloc in this way will lead to memory corruption (it does for me when I made a simple test case). For memory that really is external (not SEXP), then you should instead use Calloc and register a finalizer for the external pointer that will do any required cleanup and then call Free. If instead you want to have an externally managed SEXP, you could put it in the protected slot of the external pointer, but then you should allocate it using standard R allocation functions. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] generic '[' for a non-exported class
On 4/7/10 1:09 AM, Christophe Genolini wrote: Hi all, I define a S4 class 'foo'. I define '[' and '[-' for it. I do not want to export foo, so I do not put it in NAMESPACE. I do not want to export '[' and '[-' either (since the user can not use foo, no raison to give him access to '[' for foo). But R CMD check does not agree with me and report an error: Undocumented S4 methods: generic '[' and siglist 'foo' generic '[-' and siglist 'foo' Any solution ? You can document these on an internal API Rd page. Create an Rd file like yourPkg-internal-api.Rd and add the appropriate \alias{} lines to it. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as(1:4, numeric) versus as.numeric(1:4, numeric)
On 3/31/10 4:52 PM, John Chambers wrote: The example is confusing and debatable, but not an obvious bug. And your presentation of it is the cause of much of the confusion (unintentionally I'm sure). To restate the issue (I think): In a new R session if you happen to call: selectMethod(coerce, c(integer, numeric)) *Before* having made a call like as(1:4, numeric) then there is a side-effect of creating definition A of the integer = numeric coerce method. From this point forward all calls to as(x, numeric) when x is integer will return as.numeric(x). If instead you do not call selectMethod, then when calling as(x, numeric) for x integer you get definition B, the documented behavior, which simply returns x. Presumably there are other similar cases where this will be an issue. So while I agree this could be considered obscure, this qualifies as a bug in my book. It seems desirable that selectMethod not change the state of the system in a user-visible fashion. And calling selectMethod, or any other function, should not alter dispatch unless documented to do so. I'm also suspicious of the behavior of the strict argument: class(as(1:4, numeric)) [1] integer class(as(1:4, numeric, strict = TRUE)) [1] integer class(as(1:4, numeric, strict = FALSE)) [1] integer Is that intended? + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Difference Linux / Windows
On 3/31/10 1:12 PM, Christophe Genolini wrote: Hi the list, I am writing a package that happen to not be compatible with linux because I did not know that the function savePlot was available only on windows. Is there a list of incompatible function? How can I get this kind of information? One way is to obtain a copy of the R sources and then grep the Rd files for '#ifdef'. I don't claim this is convenient. There has been discussion, and I believe general consensus, that we'd like to eliminate the conditional documentation. This requires editing the Rd files to make the contents sensible (you can't just remove the #ifdef's). Patches along these lines would be welcome. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] update.packages(1)
On 3/27/10 1:43 PM, Duncan Murdoch wrote: On 25/03/2010 3:16 PM, Arni Magnusson wrote: I'm relaying a question from my institute's sysadmin: Would it be possible to modify update.packages() and related functions so that 'lib.loc' accepts integer values to specify a library from the .libPaths() vector? Many Linux users want to update all user packages (inside the R_LIBS_USER directory, e.g. ~/r/library) and none of the system packages (inside the /usr directory, e.g. /usr/lib64/R/library), because they don't have write privileges to update the system packages. Currently, this can be done by pressing 'y RET' for all the user packages and 'RET' for all the system packages. This hard work and careful reading when there dozens of packages. Another way is to run update.packages(Sys.getenv(R_LIBS_USER)) or: update.packages(.libPaths()[1]) You could also save some work by putting ask=FALSE, or ask=graphics in as another argument. But isn't it easy enough to write your own function as a wrapper to update.packages, suiting your own local conventions? It seems like a bad idea to make update.packages too friendly, when there are several different friendly front-ends for it already (e.g. the menu entries in Windows or MacOS GUIs). But it would be nicer for the user to type update.packages(1) using a 'pos' like notation to indicate the first element of the .libPaths() vector. --- A separate but related issue is that it would be nice if the R_LIBS_USER library would be the first library by default. Currently, my sysadmin must use Rprofile.site to shuffle the .libPaths() to make R_LIBS_USER first, which seems like a sensible default when it comes to install.packages() and remove.packages(). I'm confused. AFAICT, R_LIBS_USER _is_ put first. Following the advice in the Admin manual, I created a directory matching the default value of R_LIBS_USER (Sys.getenv(R_LIBS_USER) to see it). Then when I start R, I get: .libPaths() [1] /home/sfalcon/R/x86_64-unknown-linux-gnu-library/2.11 [2] /home/sfalcon/build/rd/library Isn't that what you want? __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] list_files() memory corruption?
On 3/20/10 2:03 PM, Seth Falcon wrote: On 3/20/10 1:36 PM, Alistair Gee wrote: I fixed my build problems. I also noticed that my patch wasn't correct, so I have attached a new version. This fix still grows the vector by doubling it until it is big enough, but the length is reset to the correct size at the end once it is known. This fix differs from the existing fix in subversion in the following scenario: 1.Create file Z in directory with 1 other file named Y 2. Call dir() to retrieve list of files. 3. dir() counts 2 files. 4. While dir() is executing, some other process creates file X in the directory. 5. dir() retrieves the list of files, stopping after 2 files. But by chance, it retrieves files X and Y (but not Z). 6. dir() returns files X and Y, which could be misinterpreted to mean that file Z does not exist. In contrast, with the attached fix, dir() would return all 3 files. I think the scenario you describe could happen with either version. Once you've read the files in a directory, all bets are off. Anything could happen between the time you readdir() and return results back to the user. I agree, though, that avoiding two calls to readdir narrows the window. Also, the existing fix in subversion doesn't seem to handle the case where readdir() returns fewer files than was originally counted as it doesn't decrease the length of the vector. Yes, that's a limitation of the current fix. Have you run 'make check-devel' with your patch applied? Have you run any simple tests for using dir() or list.files() with recursive=TRUE on a reasonably large directory and compared times and memory use reported by gc()? It is often the case that writing the patch is the easy/quick part and making sure that one hasn't introduced new infelicities or unintended behavior is the hard part. I will try to take another look at your latest patch. I've applied a modified version of your patch. In the testing that I did, avoiding the counting step resulted in almost 2x faster times for large directory listings with recursive=TRUE at the cost of a bit more memory. The code also now includes a check for user interrupt, so that you can C-c out of dir/list.files call more quickly. Thanks for putting together the patch. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: Not having to export .conflicts.OK in name spaces
On 3/22/10 3:57 AM, Martin Maechler wrote: SF == Seth Falcons...@userprimary.net on Fri, 19 Mar 2010 13:47:17 -0700 writes: SF On 3/17/10 9:11 AM, Henrik Bengtsson wrote: Currently library() and attach() fail to locate an existing '.conflicts.OK' in a package wit name space, unless it is exported. Since there should be little interest in exporting '.conflicts.OK' otherwise, one may argue that those methods should look for '.conflicts.OK' even if it is not exported. SF I guess I agree that there is no real value in forcing SF .conflicts.OK to be exported. so do I. So I guess we agree that Henrik's patch would be worth applying. @Henrik: if you resend your patch with the additions for attach, I will see about putting it in. SF OTOH, this seems like a dubious feature to begin. When SF is it a good idea to use it? in cases, the package author thinks (s)he knows what (s)he is doing; e.g. in the case of Matrix, I could argue that I know about the current conflicts, and I would *not* want the users of my package be intimidated by warnings about maskings... I can't say that this convinces me that .conflicts.OK is OK. Are there package authors who realize they do not know what they are doing enough to keep the warning messages :-P + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] list_files() memory corruption?
On 3/20/10 1:36 PM, Alistair Gee wrote: I fixed my build problems. I also noticed that my patch wasn't correct, so I have attached a new version. This fix still grows the vector by doubling it until it is big enough, but the length is reset to the correct size at the end once it is known. This fix differs from the existing fix in subversion in the following scenario: 1.Create file Z in directory with 1 other file named Y 2. Call dir() to retrieve list of files. 3. dir() counts 2 files. 4. While dir() is executing, some other process creates file X in the directory. 5. dir() retrieves the list of files, stopping after 2 files. But by chance, it retrieves files X and Y (but not Z). 6. dir() returns files X and Y, which could be misinterpreted to mean that file Z does not exist. In contrast, with the attached fix, dir() would return all 3 files. I think the scenario you describe could happen with either version. Once you've read the files in a directory, all bets are off. Anything could happen between the time you readdir() and return results back to the user. I agree, though, that avoiding two calls to readdir narrows the window. Also, the existing fix in subversion doesn't seem to handle the case where readdir() returns fewer files than was originally counted as it doesn't decrease the length of the vector. Yes, that's a limitation of the current fix. Have you run 'make check-devel' with your patch applied? Have you run any simple tests for using dir() or list.files() with recursive=TRUE on a reasonably large directory and compared times and memory use reported by gc()? It is often the case that writing the patch is the easy/quick part and making sure that one hasn't introduced new infelicities or unintended behavior is the hard part. I will try to take another look at your latest patch. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] DESCRIPTION: Imports: assertion of version?
On 3/19/10 6:13 AM, Henrik Bengtsson wrote: Hi, from 'Writing R Extensions' [R version 2.11.0 Under development (unstable) (2010-03-16 r51290)] one can read: The optional `Imports' field lists packages whose name spaces are imported from but which do not need to be attached. [...] Versions can be specified, but will not be checked when the namespace is loaded. Is it a design decision that version specifications are not asserted for packages under Imports:, or is it a lack of implementation? If a design decision, under what use cases do you want to specify the version but not validating it? Is it simply because there is no mechanism for tracking the origin/package of the code importing the other package, and hence we cannot know which DESCRIPTION file to check against? I'm not aware of any use case in which the current lack of checking is a feature. I would be interested in a patch (with testing) for this. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: Not having to export .conflicts.OK in name spaces
On 3/17/10 9:11 AM, Henrik Bengtsson wrote: Currently library() and attach() fail to locate an existing '.conflicts.OK' in a package wit name space, unless it is exported. Since there should be little interest in exporting '.conflicts.OK' otherwise, one may argue that those methods should look for '.conflicts.OK' even if it is not exported. I guess I agree that there is no real value in forcing .conflicts.OK to be exported. OTOH, this seems like a dubious feature to begin. When is it a good idea to use it? + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] list_files() memory corruption?
On 3/17/10 7:16 AM, Alistair Gee wrote: Yes. I had noticed that R occasionally segfaults (especially when I run many concurrent R processes), so I used valgrind to log every use of R. In the valgrind logs, I tracked the problem to list_files(). I attached a patch to platform.c (for trunk). Unfortunately, I am having trouble building R from the subversion trunk--it is taking a very long time decompressing/installing the recommended packages--so I haven't been able to verify the fix yet. But my version of platform.c does compile, and it does simplify the code b/c count_files() is no longer needed. Hmm, I see that you grow the vector containing filenames by calling lengthgets and doubling the length. I don't see where you cleanup before returning -- seems likely you will end up returning a vector that is too long. And there are some performance characteristics to consider in terms of both run time and memory profile. Does making a single pass through the files make up for the allocations/data copying that result from lengthgets? Is it worth possibly requiring twice the memory for the worst case? + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Segfault Problem c++ R interface (detailed)
Hi, First thing to observe is that you are calling RSymbReg via .Call, but that function does not return SEXP as is required by the .Call interface. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] list_files() memory corruption?
Hi Alistair, On 3/12/10 4:37 PM, Alistair Gee wrote: I am using R-2-10 from subversion. In the implementation of do_listfiles() in platform.c, it appears to allocate a vector of length count where count is calculated by count_files(). It then proceeds to call list_files(), passing in the vector but not the value of count. Yet list_files() doesn't seem to check the length of the vector that was allocated. What happens if a new file was added to the file system between the call to count_files() and list_files()? Doesn't this write past the length of the allocated vector? Good catch. I've added a length check to prevent a problem. Cheers, + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] list_files() memory corruption?
On 3/15/10 8:37 PM, Alistair Gee wrote: I think I have a fix that avoids the problem by just growing the vector as necessary as the directory is traversed (and no longer uses count_lines()). I don't have access to the code at the moment, but I should be able to post the patch tomorrow. Is there interest in my patch? I'm curious to know if this is a problem you have encountered while using R. My initial thought is that there isn't much benefit of making this part of the code smarter. If your patch simplifies things, I'd be more interested. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [PATCH] R ignores PATH_MAX and fails in long directories (PR#14228)
On 3/11/10 12:45 AM, Henrik Bengtsson wrote: Thanks for the troubleshooting, I just want to second this patch; it would be great if PATH_MAX could be used everywhere. The patch, or at least something quite similar, was applied in r51229. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] shash in unique.c
On 3/5/10 4:40 AM, Matthew Dowle wrote: Thanks a lot. Quick and brief responses below... Duncan Murdochmurd...@stats.uwo.ca wrote in message news:4b90f134.6070...@stats.uwo.ca... Matthew Dowle wrote: I was hoping for a 'yes', 'no', 'maybe' or 'bad idea because ...'. No response resulted in a retry() after a Sys.sleep(10 days). If its a yes or maybe then I could proceed to try it, test it, and present the test results and timings to you along with the patch. It would be on 32bit Ubuntu first, and I would need to either buy, rent time on, or borrow a 64bit machine to be able to then test there, owing to the nature of the suggestion. If its no, bad idea because... or we were already working on it, or better, then I won't spend any more time on it. Matthew Matthew Dowlemdo...@mdowle.plus.com wrote in message news:hlu4qh$l7...@dough.gmane.org... Looking at shash in unique.c, from R-2.10.1 I'm wondering if it makes sense to hash the pointer itself rather than the string it points to? In other words could the SEXP pointer be cast to unsigned int and the usual scatter be called on that as if it were integer? Two negative but probably not fatal issues: Pointers and ints are not always the same size. In Win64, ints are 32 bits, pointers are 64 bits. (Can we be sure there is some integer type the same size as a pointer? I don't know, ask a C expert.) No we can't be sure. But we could test at runtime, and if the assumption wasn't true, then revert to the existing method. I think the idea is, on the whole, a reasonable one and would be inclined to apply a patch if it demonstrated some measurable performance improvement. For the 32bit v 64bit issue, I think we could detect and in the 64bit case take something like: ((int)p) ^ ((int)(p 32)) We might want to save the hash to disk. On restore, the pointer based hash would be all wrong. (I don't know if we actually do ever save a hash to disk. ) The hash table in unique.c appears to be a temporary private hash, different to the global R_StringHash. Its private hash appears to be used only while the call to unique runs, then free'd. Thats my understanding anyway. The suggestion is not to alter the global R_StringHash in any way at all, which is the one that might be saved to disk now or in the future. I agree with your reading: this is a temporary hash table and there would be little reason to want to save it (it is not saved now). + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rubbish values written with zero-length vectors (PR#14217)
On 2/20/10 7:50 AM, Peter Dalgaard wrote: You don't want to understand, believe me! ;-) It's a bug, probably not the very worst kind, but accessing memory that isn't yours is potentially harmful (but writing to it is considerably worse). Looks like the issue only concerns the right hand side; nothing to do with the auto-expansion of v. I also get v - integer(0) u - integer(1) u[[2]] -v u [1] 0 142000760 u[[1]] -v u [1] 142000760 142000760 a - 1 a[[1]] -v a [1] 142000760 I'm thinking this should be an error. Similar to: v = 1 v[[1]] = integer(3) Error in v[[1]] = integer(3) : more elements supplied than there are to replace But instead not enough elements supplied. Perhaps: v[[1]] = integer() Error in v[[1]] = integer() : [[ ]] replacement has zero length The code in do_subassign2_dflt currently does not check that the replacement has length 0 for the nsubs == 1 case. I think we want: @@ -1529,6 +1532,8 @@ do_subassign2_dflt(SEXP call, SEXP op, SEXP args, SEXP rho) if (nsubs == 0 || CAR(subs) == R_MissingArg) error(_([[ ]] with missing subscript)); if (nsubs == 1) { +if (length(y) == 0) +error(_([[ ]] replacement has zero length)); offset = OneIndex(x, thesub, length(x), 0, newname, recursed ? len-1 : -1, R_NilValue); if (isVectorList(x) isNull(y)) { x = DeleteOneVectorListItem(x, offset); + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R_LIBS_USER bugs
Hi, On 2/16/10 10:31 AM, Jens Elkner wrote: Having currently a big problem with R 2.10.1 vanilla (Solaris): As soon as the R_LIBS_USER env var gets bigger than 1023 chars R completely ignores it and uses the default: I guess the first question is, why do need such a long list of library directories? Sys.getenv('R_LIBS_USER'); R_LIBS_USER ${R_LIBS_USER-~/R/i386-pc-solaris2.11-library/2.10} I see the same thing with R-devel on OS X. I can set R_LIBS_USER from within R using Sys.setenv to a value longer than 1024 and retrieve it again. But if I have such a value in my shell, it gets overwritten. I'm not yet sure what is going on. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unexpected behaviour of x[i] when i is a matrix, on Windows
On 2/12/10 10:12 AM, Peter Ehlers wrote: You're comparing 2.10.0 on Windows with 2.11.0 on Linux. Have you tried 2.11.0 on Windows? = same result as on Linux. Indeed, this is new functionality added to R-devel (5 Jan). Indexing an n-dim array with an n-column matrix used to only be supported when the matrix contained integers. Character matrices are now supported to map to dimnames of the array. Here's the NEWS entry: o n-dimensional arrays with dimension names can now be indexed by an n-column character matrix. The indices are matched against the dimension names. NA indices are propagated to the result. Unmatched values and are not allowed and result in an error. Cheers, + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Compiling R projects with multiple external libraries
On 2/11/10 9:43 AM, rt wrote: Hi, I have just learned how to use compile and link libraries using make and how to create R projects using R CMD build or INSTALL. My understanding of both is somewhat limited and hence the question. I have a main library written in c which depends on other external libraries. Main library is to be called from R using .Call. The goal is to create a single R project that will compile all the external libraries, the main library, R-C wrappers and install it. I am unsure about the proper structure of R project directories and the general workflow such that: (a) external libraries and the main libraries are built first using make that I already have (b) R-C Wrapper is compiled and installed using R CMD install. I understand that there are issues using Makefiles and that there are preferred ways of doing these things. I am not sure how to use Makevars instead of Makefile for this purpose. Any help and in particular pointers to examples of R packages with multiple external libraries would be appreciated. 1.2.1 Using Makevars in WRE (R-ext manual) has some detail on this and suggests looking at fastICA for an example. Quote from manual: If you want to create and then link to a library, say using code in a subdirectory, use something like .PHONY: all mylibs all: $(SHLIB) $(SHLIB): mylibs mylibs: (cd subdir; make) + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] src/main/platform.c (PR#14198)
On 1/28/10 3:50 AM, a.r.runna...@kent.ac.uk wrote: At line 312 in src/main/platform.c (at the latest svn revision, 51039): if (length(tl)= 1 || !isNull(STRING_ELT(tl, 0))) should not '||' read ''? Likewise four lines later. Thanks, I'll fix this up. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] calling setGeneric() twice
On 1/19/10 10:01 AM, Ross Boylan wrote: Is it safe to call setGeneric twice, assuming some setMethod's for the target function occur in between? By safe I mean that all the setMethod's remain in effect, and the 2nd call is, effectively, a no-op. ?setGeneric says nothing explicit about this behavior that I can see. It does say that if there is an existing implicity generic function it will be (re?)used. I also tried ?Methods, google and the mailing list archives. I looked at the code for setGeneric, but I'm not confident how it behaves. It doesn't seem to do a simple return of the existing value if a generic already exists, although it does have special handling for that case. The other problem with looking at the code--or running tests--is that they only show the current behavior, which might change later. This came up because of some issues with the sequencing of code in my package. Adding duplicate setGeneric's seems like the smallest, and therefore safest, change if the duplication is not a problem. I'm not sure of the answer to your question, but I think it is the wrong question :-) Perhaps you can provide more detail on why you are using multiple calls to setGeneric. That seems like a very odd thing to do. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] calling setGeneric() twice
On 1/19/10 11:19 AM, Ross Boylan wrote: If files that were read in later in the sequence extended an existing generic, I omitted the setGeneric(). I had to resequence the order in which the files were read to avoid some undefined slot classes warnings. The resequencing created other problems, including some cases in which I had a setMethod without a previous setGeneric. I have seen the advice to sequence the files so that class definitions, then generic definitions, and finally function and method definitions occur. I am trying not to do that for two reasons. First, I'm trying to keep the changes I make small to avoid introducing errors. Second, I prefer to keep all the code related to a single class in a single file. If at first you do not get the advice you want, ask again! :-) Perhaps you could do something like: if (!isGeneric(blah)) { setGeneric(blah, ...) } I would expect setGeneric to create a new generic function and nuke/mask methods associated with the generic that it replaces. Some of the files were intended for free-standing use, and so it would be useful if they could retain setGeneric()'s even if I also need an earlier setGeneric to make the whole package work. I am also working on a python script to extract all the generic function defintions (that is, setGeneric()), just in case. Perhaps another option is to group all of the generics together into a package and reuse that? Unless you are using valueClass, I don't think you will need any class definitions. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] optional package dependency
On 1/15/10 7:51 AM, Uwe Ligges wrote: the Windows checks for CRAN run with that setting, i.e. _R_CHECK_FORCE_SUGGESTS_=false Hence the multicore issue mentioned below actually does not exist. I did not know that the Windows checks for CRAN used this setting. My concern was initiated by a Bioconductor package developer wanting to use multicore and I mistakenly thought the issue would exist for CRAN as well. Bioconductor currently uses the default configuration for check on all platforms. For the CRAN case, there is no immediate problem. While there isn't an issue at hand, the approach still seems lacking. What happens when there is a Windows only package that folks want to optionally use? Perhaps public repositories should then not force suggests for any platforms (do they already?) -- I think that is a reasonable and simple solution. But in that case, perhaps the deafult value should change. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] optional package dependency
On 1/15/10 7:47 AM, Simon Urbanek wrote: On Jan 15, 2010, at 10:22 , Seth Falcon wrote: I believe another option is: pkg - somePkg pkgAvail - require(pkg, character.only = TRUE) if (pkgAvail) ... else ... That is not an option - that is the code you usually use with Suggests: (except for the pkg assignment which is there I presume to obscure things). Unfortunately, it _is_ an option, just not a good one :-) Some packages need to dynamically load other packages (think data packages) and they will not know ahead of time what packages they will load. So there has to be some sort of loop-hole in the check logic. In legitimate cases, this is not obscuring anything. In this case, I think we agree the use would not be legitimate. I'm less and less convinced that the force suggests behavior is useful to anyone. Package repositories can easily attempt to install all suggests and so packages will get complete testing. Package authors should be responsible enough to test their codes with and without optional features. The slight convenience for an author to know that optional packages are missing is at least equally balanced with the slight inconvenience of having to change the check configuration in order to test in the case of missing suggests. Anyway... __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] optional package dependency
On 1/15/10 12:19 AM, Kurt Hornik wrote: Jeff Ryan writes: Hi Ross, The quantmod package makes available routines from a variety of contributed packages, but gets around your issues with a bit of, um, trickery. Take a look here (unless your name is Kurt ;-) ): I believe another option is: pkg - somePkg pkgAvail - require(pkg, character.only = TRUE) if (pkgAvail) ... else ... But Kurt will we happy to tell you that you can turn off forcing suggested packages for checking by setting _R_CHECK_FORCE_SUGGESTS_=false in your environment. The idea is that maintainers typically want to fully check their functionality, suggesting to force suggests by default. Unless the public repositories such as CRAN and Bioconductor decide to set this option, it provides no solution for anyone who maintains or plans to make available a package through a public R repository such as CRAN or Bioconductor. There is a real need (of some kind) here. Not all packages work on all platforms. For example, the multicore package provides a mechanism for running parallel computations on a multi-cpu box, but it is not available on Windows. A package that _is_ available on all platforms should be able to optionally make use of multicore on non-Windows. I don't think there is a way to do that now and pass check without resorting to tricks as above. These tricks are bad as they make it harder to programmatically determine the true suggests. And NAMESPACE brings up another issue in that being able to do conditional imports would be very useful for these cases, otherwise you simply can't make proper use of name spaces for any optional functionality. I'm willing to help work on and test a solution if we can arrive at some consensus as to what the solution looks like. Best, + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How x[, 'colname1'] is implemented?
On 1/1/10 1:40 PM, Peng Yu wrote: On Fri, Jan 1, 2010 at 6:52 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Thu, Dec 31, 2009 at 11:27 PM, Peng Yu pengyu...@gmail.com wrote: I don't see where describes the implementation of '[]'. For example, if x is a matrix or a data.frame, how the lookup of 'colname1' is x[, 'colname1'] executed. Does R perform a lookup in the a hash of the colnames? Is the reference O(1) or O(n), where n is the second dim of x? Where have you looked? I doubt this kind of implementation detail is in the .Rd documentation since a regular user doesn't care for it. I'm not complaining that it is not documented. As Obi-wan Kenobi may have said in Star Wars: Use the source, Luke!: Line 450 of subscript.c of the source code of R 2.10 is the stringSubscript function. It has this comment: /* The original code (pre 2.0.0) used a ns x nx loop that was too * slow. So now we hash. Hashing is expensive on memory (up to 32nx * bytes) so it is only worth doing if ns * nx is large. If nx is * large, then it will be too slow unless ns is very small. */ Could you explain what ns and nx represent? integers :-) Consider a 5x5 matrix m and a call like m[ , c(C, D)], then in the call to stringSubscript: s - The character vector of subscripts, here c(C, D) ns - length of s, here 2 nx - length of the dimension being subscripted, here 5 names - the dimnames being subscripted. Here, perhaps c(A, B, C, D, E) The definition of large and small here appears to be such that: 457: Rboolean usehashing = in ( ((ns 1000 nx) || (nx 1000 ns)) || (ns * nx 15*nx + ns) ); The 'in' argument is always TRUE AFAICS so this boils down to: Use hashing for x[i] if either length(x) 1000 or length(i) 1000 (and we aren't in the trivial case where either length(x) == 0 or length(i) == 0) OR use hashing if (ns * nx 15*nx + ns) + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Error in namespaceExport(ns, exports) :
On 12/3/09 3:10 PM, David Scherrer wrote: Dear all, I get the error Error in namespaceExport(ns, exports) : undefined exports function1 , function2 when compiling or even when I roxygen my package. The two function I once had in my package but I deleted them including their .Rd files. I also can't find them in any other function or help file. So does anybody know where these functions are still listed that causes this error? Are you sure they are not in your NAMESPACE file? -- Seth Falcon Program in Computational Biology | Fred Hutchinson Cancer Research Center __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to generate dependency file that can be used by gnu make?
On 11/17/09 5:02 AM, Peng Yu wrote: This may not easy to do, when the filename are not hard coded strings. For example, the variable 'filename' is a vector of strings. for (i in 1:length(filename)){ do something... save(,file=filename[i]) } That's right. I don't think there is a feasible general solution. You might have more success with a convention-based approach for your scripts that would allow a simple parser to identify output files by name convention, for example. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] error checks
On 11/13/09 8:02 AM, Tony Plate wrote: Putting options(error=function() NULL) at the start of the .R will let R CMD check continue with commands in a file after stop() is called. (Or anything other than the default options(error=NULL)). But that's a rather heavy handed approach and could easily mask errors that you are not expecting. Instead, how about using tryCatch so that you limit the errors that you trap and also can verify that an error was indeed trapped. Perhaps something like this: f - function(x) if (x) stop(crash!) else NULL res - tryCatch( { f(TRUE) # this will raise an error FALSE # only get here if no error }, error = function(e) TRUE) ## verify we saw an error stopifnot(res) + seth -- Seth Falcon | @sfalcon | http://userprimary.net/users __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] typo in docs for unlink()
On 11/11/09 2:36 AM, Duncan Murdoch wrote: On 10/11/2009 11:16 PM, Tony Plate wrote: PS, I should have said that I'm reading the docs for unlink in R-2.10.0 on a Linux system. The docs that appear in a Windows installation of R are different (the Windows docs do not mention that not all systems support recursive=TRUE). Here's a plea for docs to be uniform across all systems! Trying to write R code that works on all systems is much harder when the docs are different across systems, and you might only see system specific notes on a different system than the one you're working on. That's a good point, but in favour of the current practice, it is very irritating when searches take you to functions that don't work on your system. One thing that might be possible is to render all versions of the help on all systems, but with some sort of indicator (e.g. a colour change) to indicate things that don't apply on your system, or only apply on your system. I think the hardest part of doing this would be designing the output; actually implementing it would not be so bad. I would be strongly in favor of a change that provided documentation for all systems on all systems. Since platform specific behavior for R functions is the exception rather than the norm, I would imagine that simply displaying doc sections by platform would be sufficient. I think the benefit of being able to see what might not work on another platform far out weighs the inconvenience of finding doc during a search for something that only works on another platform -- hey, that still might be useful as it would tell you what platform you should use ;-) + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] memory misuse in subscript code when rep() is called in odd way
Hi, On 11/3/09 2:28 PM, William Dunlap wrote: The following odd call to rep() gives somewhat random results: rep(1:4, 1:8, each=2) I've committed a fix for this to R-devel. I admit that I had to reread the rep man page as I first thought this was not a valid call to rep since times (1:8) is longer than x (1:4), but closer reading of the man page says: If times is a vector of the same length as x (after replication by each), the result consists of x[1] repeated times[1] times, x[2] repeated times[2] times and so on. So the expected result is the same as rep(rep(1:4, each=2), 1:8). valgrind says that the C code is using uninitialized data: rep(1:4, 1:8, each=2) ==26459== Conditional jump or move depends on uninitialised value(s) ==26459==at 0x80C557D: integerSubscript (subscript.c:408) ==26459==by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658) A little investigation seems to suggest that the problem is originating earlier. Debugging in seq.c:do_rep I see the following: rep(1:4, 1:8, each=2) Breakpoint 1, do_rep (call=0x102de0068, op=value temporarily unavailable, due to optimizations, args=value temporarily unavailable, due to optimizations, rho=0x1018829f0) at /Users/seth/src/R-devel-all/src/main/seq.c:434 434 ans = do_subset_dflt(R_NilValue, R_NilValue, list2(x, ind), rho); (gdb) p Rf_PrintValue(ind) [1] 1 1 1 2 2 2 [7] 2 2 2 2 3 3 [13] 3 3 3 3 3 3 [19] 3 3 3 4 4 4 [25] 4 4 4 4 4 4 [31] 4 4 4 4 4 4 [37] 44129344 1 44129560 1 44129776 1 [43] 44129992 1 44099592 1 44099808 1 [49] 44100024 1 44100456 127241443801089 [55] -536870733 0 54857992 1 22275728 1 [61]2724144 1 34 1 44100744 1 [67] 44100960 1 44101176 1 43652616 1 $2 = void (gdb) c Continuing. Error: only 0's may be mixed with negative subscripts The patch I applied adjusts how the index vector length is computed when times has length more than one. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] parse_Rd and/or lazyload problem
Hi, On 11/3/09 6:51 PM, mark.braving...@csiro.au wrote: file.copy( 'd:/temp/Rdiff.Rd', 'd:/temp/scrunge.Rd') # Rdiff.Rd from 'tools' package source eglist- list( scrunge=parse_Rd( 'd:/temp/scrunge.Rd')) tools:::makeLazyLoadDB( eglist, 'd:/temp/ll') e- new.env() lazyLoad( 'd:/temp/ll', e) as.list( e) # force; OK eglist1- list( scrunge=parse_Rd( 'd:/temp/Rdiff.Rd')) tools:::makeLazyLoadDB( eglist1, 'd:/temp/ll') e- new.env() lazyLoad( 'd:/temp/ll', e) as.list( e) # Splat It doesn't make any difference which file I process first; the error comes the second time round. If I adjust this example in terms of paths and run on OS X, I get the following error on the second run: as.list(e) # Splat Error in as.list.environment(e) : internal error -3 in R_decompress1 I haven't looked further yet. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Help with lang4
On 10/29/09 7:00 AM, Abhijit Bera wrote: Hi I seem to have run into a situation where I have more than 3 arguments to pass to a function from C. the following functions help me build an expression for evaluation: lang lang2 lang3 lang4 What should one do if there are more arguments than lang4 can handle? If you take a look at the source code for those functions, something may suggest itself. R function calls at the C level are composed like in lisp: a pair-list starting with the function cons'ed with the args. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Help with lang4
On 10/29/09 7:38 AM, Abhijit Bera wrote: Can't find the source to Rf_lang* series of functions. :| But I'm thinking it should be like this correct me if I'm wrong: PROTECT(e=lang4(install(myfunction),arg1,arg2,arg3); PROTECT(SETCAR(CDR(e),portConstraints)); PROTECT(portVal=R_tryEval(e,R_GlobalEnv, NULL)); Perhaps I'm misunderstanding your goal, but I do not think this is correct. After this call: PROTECT(e=lang4(install(myfunction),arg1,arg2,arg3); e can be visualized as: (myfunction (arg1 (arg2 (arg3 nil If you want to end up with: (myfunction (arg1 (arg2 (arg3 (arg4 nil) Then you either will want to build up the pair list from scratch or you could use some of the helpers, e.g. (all untested), SEXP last = lastElt(e); SEXP arg4Elt = lang1(arg4); SETCDR(last, arg4Elt); Reading Rinlinedfuns.h should help some. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#14012)
* On 2009-10-16 at 15:00 +0200 sj...@damtp.cam.ac.uk wrote: I think Rscript has a problem running files that have mac encodings for newline (^M rather than ^J on linux). If I source the file within R, it works okay: source('j.R') [1] MEA_data/sernagor_new/CRX_P7_1.txt But if I run the file using Rscript on a linux box I get a strange error message: $ Rscript --vanilla j.R Execution halted I think you are right that Rscript is unhappy to handle files with CR line terminators. But IIUC, the purpose of Rscript is to enable R script execution on unix-like systems like: #!/path/to/Rscript --vanilla print(1:10) So then I'm not sure how useful it is for Rscript to handle such files. Why not convert to a more common and portable line termination for your R script files? + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to document stuff most users don't want to see
Writing good documentation is hard. I can appreciate the desire to find technological solutions that improve documentation. However, the benefit of a help system that allows for varying degrees of verbosity is very likely to be overshadowed by the additional complexity imposed on the help system. Users would need to learn how to tune the help system. Developers would need to learn and follow the system of variable verbosity. This time would be better spent by developers simply improving the documentation and by users by simply reading the improved documentation. My $0.02. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] unit testing for R packages?
Hi, On Mon, Oct 5, 2009 at 12:01 PM, Blair Christian blair.christ...@gmail.com wrote: I'm interested in putting some unit tests into an R package I'm building. Â I have seen assorted things such as Runit library, svUnit library, packages with 'tests' directories, etc I grep'd unit test through the writing R extensions manual but didn't find anything. Â Are there any suggestions out there? Â Currently I have several (a lot?) classes/methods that I keep tinkering with, and I'd like to run a script frequently to check that I don't cause any unforeseen problems. I've had good experiences using RUnit. To date, I've mostly used RUnit by putting tests in inst/unitTests and creating a Makefile there to run the tests. You should also be able to use RUnit in a more interactive fashion inside an interactive R session in which you are doing development. The vignette in svUnit has an interesting approach for integrating unit testing into R CMD check via examples in an Rd file within the package. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] build time dependency
On Mon, Sep 28, 2009 at 11:25 AM, Romain Francois romain.franc...@dbmail.com wrote: Hi Uwe, I think you are supposed to do this kind of sequence: R CMD roxygen yourRoxygenablePackage R CMD build yourRoxygenablePackage_roxygen ... but I don't like this because what you upload to cran is not the actual source but somethingalready pre-processed. (This also applies to packages shipping java code, most people just compile the java code on their machine and only supply a jar of compiled code, but that's another story I suppose ...) I'd prefer the roxygenation to be part of the standard build/INSTALL system, so my plan is to write configure and configure.win which would call roxygenize to generate Rd. I can appreciate the desire to make the true sources available. At the same time, I think one should very carefully consider the expense of external dependencies on a package. One could view doc generation along the same lines as configure script generation -- a compilation step that can be done once instead of by all those who install and as a result reduce the depencency burden of those wanting to install the package. Configure scripts are almost universally included pre-built in distribution source packages so that users do not need to have the right version of autoconf/automake. In other words, are you sure you want to require folks to install roxygen (or whatever) in order to install your package? Making it easy to do so is great, but in general if you can find a way to reduce dependencies and have your package work, that is better. :-) + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] crash with NAs in subscripted assignment of a raw vector
2009/9/24 Hervé Pagès hpa...@fhcrc.org:  x - charToRaw(ABCDEFGx)  x[c(1:3, NA, 6)] - x[8]  *** caught segfault ***  address 0x8402423f, cause 'memory not mapped' Thanks for the report. I have a fix which I will commit after some testing. -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rcmdr package dependencies
* On 2009-09-22 at 20:16 +0200 Uwe Ligges wrote: no, this is not possible. Consider your package A (or Rcmdr) suggests B that suggests C. Then A::foo uses the function B::bar which only works if C::dep is present. B works essentially without C but it requires C just to make bar work. Then this means your A::foo won't work if C is not installed and you won't get it with the setup mentioned above. In summary, I fear what you want might work well *now* (by chance), but it does not work in general. In general, one would expect a given package to function when its suggested packages are not available. As such, it seems quite reasonable to install a package, its Depends, Imports, and Suggests, but not install Suggests recursively. I think you could achieve such an installation using two calls to install.packages: install.packages(Rcmdr) Rcmdr.Suggests - strsplit(packageDescription(Rcmdr)$Suggests, ,\\s?)[[1]] ## need extra cleanup since packageDescription(blah)$Suggests ## Returns package names with versions as strings wantPkgs - sub(^([^ ]+).*, \\1, Rcmdr.Suggests) havePkgs - installed.packages()[, Package] wantPkgs - wantPkgs[!(wantPkgs %in% havePkgs)] install.packages(wantPkgs) + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] asking for suggestions: interface for a C++ class
* On 2009-09-04 at 22:54 +0200 Yurii Aulchenko wrote: We are at an early stage of designing an R library, which is effectively an interface to a C++ library providing fast access to large matrices stored on HDD as binary files. The core of the C++ library is relatively sophisticated class, which we try to mirror using an S4 class in R. Basically when a new object of that class is initiated, the C++ constructor is called and essential elements of the new object are reflected as slots of the R object. Have a look at external pointers as described in the Writing R Extensions Manual. Now as you can imagine the problem is that if the R object is removed using say rm command, and not our specifically designed one, the C++ object still hangs around in RAM until R session is terminated. This is not nice, and also may be a problem, as the C++ object may allocate large part of RAM. We can of cause replace generic rm and delete functions, but this is definitely not a nice solution. You likely want a less literal translation of your C++ object into R's S4 system. One slot should be an external pointer which will give you the ability to define a finalizer to clean up when the R level object gets gc'd. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Conditional dependency between packages
Hi Jon, * On 2009-06-30 at 15:27 +0200 Jon Olav Skoien wrote: I work on two packages, pkg1 and pkg2 (in two different projects). pkg1 is quite generic, pkg2 tries to solve a particular problem within same field (geostatistics). Therefore, there might be users who want to use pkg2 as an add-on package to increase the functionality of pkg1. In other words, functions in pkg1 are based on the S3 class system, and I want pkg2 to offer methods for pkg2-objects to functions defined in pkg1, for users having both packages installed. Merging the packages or making pkg2 always depend pkg1 would be the easiest solution, but it is not preferred as most users will only be interested in one of the packages. I'm not sure I understand the above, I think you may have a pkg2 where you meant pkg1, but I'm not sure it matters. I think the short version is, pkg2 can be used on its own but will do more if pkg1 is available. I don't think R's packaging system currently supports conditional dependencies as you might like. However, I think you can get the behavior you want by following a recipe like: * In pkg2 DESCRIPTION, list Suggests: pkg1. * In pkg2 code, you might define a package-level environment and in .onLoad check to see if pkg1 is available. PKG_INFO - new.env(parent=emptyenv()) .onLoad - function(libname, pkgname) { if (check if pkg1 is available) { PKG_INFO[[pkg1]] - TRUE } } * Then your methods can check PKG_INFO[[pkg1]]. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug in Rf_PrintValue ?
Hi, * Kynn Jones wrote: I'm very green with R, so maybe this is not a bug, but it looks like one to me. The following program segfaults at the second call to Rf_PrintValue(). Yes, I think you found a bug. * On 2009-06-26 at 16:09 -0700 Martin Morgan wrote: mkChar creates a CHARSXP. These are not normally user-visible, but instead are placed into a STRSXP (vector type of 'character' in R). So you want to PROTECT( x_r = allocVector(STRSXP, 1) ); SET_STRING_ELT(x_r, 0, mkChar( x )); (There is also mkString( x ) for the special case of constructing character(1)). I think the segfault is because the CHARSXP returned by mkChar is initialized with information different from that expected of user-visible SEXPs (I think it is the information on chaining the node to the hash table; see Defn.h:120 and memory.c:2844); I think the success of Rf_PrintValue on 'foo' is a ghost left over from when CHARSXPs were user-visible. CHARSXPs are not intended to be user-visible. However, Rf_PrintValue should not segfault either. Indeed the root cause was attempting to print the attributes for the CHARSXP which have been repurposed for handling the CHARSXP cache. I have patched R-devel so that PrintValue works as expected on CHARSXPs. The original code should now work without crashing. But this should really only be used to assist in debugging. CHARSXPs should never be exposed at the user level and should instead be elements of a character vector (STRSXP). + seth -- Seth Falcon http://userprimary.net/user __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64
Hi Dirk, * On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote: Turns out, as so often, that there was a regular bug lurking which is now fixed in RDieHarder 0.1.1. But I still would like to understand exactly what is different so that --slave was able to trigger it when --vanilla, --no-save, ... did not. [ The library() vs require() issue may have been a red herring. ] Without telling us any details about the nature of the bug you found, it is difficult to speculate. If the bug was in your C code and memory related, it could simply be that the two different run paths resulted in different allocation patterns, one of which triggered the bug. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64
* On 2009-01-31 at 09:34 -0600 Dirk Eddelbuettel wrote: | Without telling us any details about the nature of the bug you found, | it is difficult to speculate. If the bug was in your C code and | memory related, it could simply be that the two different run paths | resulted in different allocation patterns, one of which triggered the | bug. Yes yes and yes :) It was in C, and it was memory related and it dealt getting results out of the library to which the package interfaces. But short of looking at the source, is there any documentation on what --slave does differently? The R-intro manual has a brief description: --slave Make R run as quietly as possible. This option is intended to support programs which use R to compute results for them. It implies --quiet and --no-save. I suspect that for more detail than that, one would have to look at the sources. But the above helps explain the behavior you saw; a --quite R will suppress some output and that will make a difference in terms of memory allocation. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R crashes on sprintf with bad format specification (PR#13283)
* On 2008-11-13 at 18:51 -0500 Duncan Murdoch wrote: On 12/11/2008 8:30 PM, [EMAIL PROTECTED] wrote: Full_Name: Oren Cheyette Version: 2.7.2 OS: Win XP Submission from: (NULL) (64.161.123.194) Enter the following at the R command prompt: sprintf(A %S %S %S XYZ, 1, 1, 1); Note the erroneous capitalized %S instead of %s and the numeric inputs instead of strings. With strings there's no crash - R reports bad format specifications. 2.7.2 is obsolete, but I can confirm a crash on Windows with a recent R-devel. Can confirm as well on OSX with a fairly recent R-devel. (gdb) bt 10 #0 0x9575e299 in _UTF8_wcsnrtombs () #1 0x957bb3a0 in wcsrtombs_l () #2 0x956ebc1e in __vfprintf () #3 0x95711e66 in sprintf () #4 0x00492bb8 in do_sprintf (call=0x10cb470, op=0x1018924, args=value temporarily unavailable, due to optimizations, env=0x10a40b0) at ../../../../R-devel-all/src/main/sprintf.c:179 #5 0x003fe1af in do_internal (call=0x10cb4a8, op=0x100fc38, args=0x10a40e8, env=0x10a40b0) at ../../../../R-devel-all/src/main/names.c:1140 + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 coercion responsibility
A couple more comments... * On 2008-09-15 at 10:07 -0700 Seth Falcon wrote: The example is with RSQLite but the same thing happens with RMySQL, and other DBI packages. The use of as() within the various DBI packages should be re-evaluated. I suspect some of that code was among the first to make heavy use of S4. As S4 has evolved and become better documented and understood, the DBI packages may not always have had a chance to keep up. library(RSQLite) Loading required package: DBI m - dbDriver(SQLite) con - dbConnect(m) setClass(SQLConPlus, contains=c(SQLiteConnection,integer)) [1] SQLConPlus conPlus - new(SQLConPlus, con, 1) dbListTables(con) character(0) dbListTables(conPlus) In the latest R-devel code (svn r46542), this behaves differently (and works as you were hoping). I get: library(RSQLite) setClass(SQLConPlus, contains=c(SQLiteConnection,integer)) dd = data.frame(a=1:3, b=letters[1:3]) con = new(SQLConPlus, dbConnect(SQLite(), :memory:), 11L) dbWriteTable(con, t1, dd) dbListTables(con) dbDisconnect(con) I know that the methods package has been undergoing some improvements recently, so it is not entirely surprising that behavior has changed. I think the new behavior is desirable as it follows the rule that the order of the superclasses listed in contains is used to break ties when multiple methods match. Here, there are two coerce() methods (invoked via as()) one for SQLiteConnection and one, I believe auto-generated, for integer. Since SQLiteConnection comes first, it is chosen. Indeed, if you try the following, you get the error you were originally seeing: setClass(SQLConMinus, contains=c(integer, SQLiteConnection)) con2 = new(SQLConMinus, dbConnect(SQLite(), :memory:), 11L) as(con, integer) [1] 15395 2 as(con2, integer) [1] 11 Why not extend SQLiteConnection and add extra slots as you like. The dispatch will in this case be much easier to reason about. This is still appropriate advice. In general, inheritance should be used with care, and multiple inheritance should be used with multiple care. Using representation() to add additional slots likely makes more sense here. + seth -- Seth Falcon | http://userprimary.net/user/ sessionInfo() R version 2.8.0 Under development (unstable) (--) i386-apple-darwin9.4.0 locale: C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RSQLite_0.7-0 DBI_0.2-4 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 coercion responsibility
* On 2008-09-17 at 19:25 -0700 Seth Falcon wrote: In the latest R-devel code (svn r46542), this behaves differently (and works as you were hoping). I get: library(RSQLite) setClass(SQLConPlus, contains=c(SQLiteConnection,integer)) dd = data.frame(a=1:3, b=letters[1:3]) con = new(SQLConPlus, dbConnect(SQLite(), :memory:), 11L) dbWriteTable(con, t1, dd) dbListTables(con) dbDisconnect(con) *argh* I'm certain this was working for me and yet when I try to reproduce in a new R shell it errors out. The dispatch is not as I wrote. as(con, integer) [1] 11 That is, the auto-generated coerce method to integer is selected in preference to the coerce method for SQLiteConnection. I think the new behavior is desirable as it follows the rule that the order of the superclasses listed in contains is used to break ties when multiple methods match. Here, there are two coerce() methods (invoked via as()) one for SQLiteConnection and one, I believe auto-generated, for integer. Since SQLiteConnection comes first, it is chosen. Indeed, if you try the following, you get the error you were originally seeing: setClass(SQLConMinus, contains=c(integer, SQLiteConnection)) con2 = new(SQLConMinus, dbConnect(SQLite(), :memory:), 11L) as(con, integer) [1] 15395 2 as(con2, integer) [1] 11 I'm still baffled how this was working for me and now is not. Nevertheless, I think it is how things *should* work and will do some further investigation about what's going on. -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 coercion responsibility
Continuing to talk to myself here... * On 2008-09-17 at 21:06 -0700 Seth Falcon wrote: *argh* I'm certain this was working for me and yet when I try to reproduce in a new R shell it errors out. This looks like an infelicity in the methods caching. To make it work: library(RSQLite) setClass(SQLConPlus, contains=c(SQLiteConnection,integer)) dd = data.frame(a=1:3, b=letters[1:3]) con.orig = dbConnect(SQLite(), :memory:) con = new(SQLConPlus, con.orig, 11L) ## call selectMethod, must have a side-effect on the ## methods cache selectMethod(coerce, signature=c(SQLConPlus, integer)) dbWriteTable(con, t1, dd) dbListTables(con) dbDisconnect(con) Now I get: as(con, integer) [1] 15719 0 Haven't tried the above in an older version of R. -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 coercion responsibility
* On 2008-09-15 at 08:56 -0400 Paul Gilbert wrote: Should functions or the user be responsible for coercing an S4 object argument containing the proper object (and thus should below be considered a bug in the packages or not)? The example is with RSQLite but the same thing happens with RMySQL, and other DBI packages. library(RSQLite) Loading required package: DBI m - dbDriver(SQLite) con - dbConnect(m) setClass(SQLConPlus, contains=c(SQLiteConnection,integer)) [1] SQLConPlus conPlus - new(SQLConPlus, con, 1) dbListTables(con) character(0) dbListTables(conPlus) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (invalid dbManager handle) dbListTables(as(conPlus, SQLiteConnection)) character(0) The problem is happening in sqliteExecStatement which does conId - as(con, integer) but con only *contains* an SQLiteConnection and the other integer causes confusion. If the line were conId - as(as(con, SQLiteConnection), integer) everything works. I can work around this, but I am curious where responsibility for this coercion should be. Well, you've created a class that is-a SQLiteConnection *and* is-a integer. The fact that the as() method dispatch doesn't match that of SQLiteConnection should really be that surprising. I don't see how this could be the responsibility of the author of the class you've subclassed. I would also question why SQLConPlus is extending integer. That seems like a very strange choice. Why not extend SQLiteConnection and add extra slots as you like. The dispatch will in this case be much easier to reason about. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to install header files in package
* On 2008-06-13 at 08:40 -0500 Dirk Eddelbuettel wrote: On 13 June 2008 at 14:28, Kjell Konis wrote: | Is there a way to get R CMD INSTALL (and friends) to copy the header | files from a source package's src directory to the include directory? Only if you (ab-)use the 'make all' target in src/Makefile to copy them, as a recent thread on r-devel showed. Some of us suggested that a 'make install' target would be a nice thing to have. Can you elaborate on the use case? If the desire is to allow pkgB to access header files provided by pkgA, then you can use the LinkingTo field in the DESCRIPTION file as described in Writing R Extensions in the Registering native routines section. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] RSQLite 0.6-9 uploaded to CRAN [was: RSQLite bug fix for install with icc]
Hi all, A new version of RSQLite has been uploaded to CRAN and should be available soon. This update contains a minor change to the C code that should improve compatibility on various unix OS. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RSQLite bug fix for install with icc
Hi Mark, [the r-sig-db list might have been a better spot for this...] * On 2008-06-04 at 14:28 -0400 Mark Kimpel wrote: I encountered problems installing RSQLite, R-2.7.0, on RHEL4 using Intel 10.1 icc, My sysadmin helped me track down the problem and kindly forwarded me the fix, which corrected the problem. What follows is from the sysadmin. Mark I looked at the error, looks like there is a bug in the source code. I've attached a new tarball, hopefully fixed. I added #include sys/types.h immediately before #include unistd.h in RSQLite/src/RS-DBI.h I will see about making such a change. I suspect the correct fix is one that tweaks configure to determine where things are based on the current system (the current code is correct for gcc I believe). Anyhow, thanks for the report. I will try to have an update within a week. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NAMESPACE methods guidance, please
* On 2008-06-01 at 11:30 -0400 John Chambers wrote: My impression (but just as a user, not an implementer) is that the NAMESPACE mechanism is intended to search for anything, not just for methods, as follows: - look in the namespace itself; - look in the imports, which are in the parent.env of the namespace; - look in the base package's namespace. As described in the R News article [1], the above describes the static component of the search mechanism, but there is a dynamic component which adds: - look in .GlobalEnv - look in each package on the search path - look (again) in base [1] http://cran.r-project.org/doc/Rnews/Rnews_2003-1.pdf Period. This provides a definition of the behavior of functions in the package that is independent of the dynamically changing contents of the search list. I think the dynamic lookup is important. Consider class Foo and some methods, like show, for working with Foo instances defined in pkgA. Further, suppose pkgB imports pkgA and contains a function that returns a Foo instance. If a user class library(pkgB) at the prompt, both the developer and the user would like for methods for dealing with Foo instances to be available. This has been achieved by adding pkgA to the Depends field of pkgB. In this case, library(pkgB) has the side-effect of attaching pkgA to the search path and Foo instances behave as desired. This, I believe, describes the first part of Martin's example: Martin Morgan: library(KEGG.db) # Imports, Depends AnnotationDbi; KEGG.db is data-only head(ls(KEGGPATHID2EXTID)) [1] hsa00232 hsa00230 hsa04514 hsa04010 hsa04012 hsa04150 John Chambers: Depends may cause the relevant packages to be put on the search list. But a subsequent attach or detach could change what objects were found. So unless this is not the intended interpretation of namespaces, looking in the search list seems a bad idea in principle. I agree that using the dynamic lookup when the static lookup is available is bad programming practice. However, given the flexibility of the current tools, it seems not unreasonable to expect that picking up a method via the search path would work in a package just as it does (should?) interactively. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R 2.7.0, match() and strings containing \0 - bug?
Hi Jon, * On 2008-04-28 at 11:00 +0100 Jon Clayden wrote: A piece of my code that uses readBin() to read a certain file type is behaving strangely with R 2.7.0. This seems to be because of a failure to match() strings after using rawToChar() when the original was terminated with a \0 character. Direct equality testing with == still works as expected. I can reproduce this as follows: x - foo y - c(charToRaw(foo),as.raw(0)) z - rawToChar(y) z==x [1] TRUE z==foo [1] TRUE z %in% c(foo,bar) [1] FALSE z %in% c(foo,bar,foo\0) [1] FALSE But without the nul character it works fine: zz - rawToChar(charToRaw(foo)) zz %in% c(foo,bar) [1] TRUE I don't see anything about this in the latest NEWS, but is this expected behaviour? Or is it, as I suspect, a bug? This seems to be new to R 2.7.0, as I said. The short answer is that your example works in R-2.6 and in the current R-devel. Whether the behavior in R-2.7 is a bug is perhaps in the eye of the beholder. Historically, R's internal string representation allowed for embedded nul characters. This was particularly useful before the raw vector type, RAWSXP, was introduced. Since the vast majority of R's internal string processing functions use standard C semantics and truncated at first nul there has always been some room for interesting behavior. The change in R-2.7 was an attempt to start resolving these inconsistencies. Since then the core team has agreed to remove the partial support for embedded nul in character strings -- raw can be used when this is desired, and having nul terminated strings will make the code more consistent and easier to maintain going forward. Best Wishes, + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] segfault in gregexpr()
Hi again, Herve wrote: gregexpr(, abc, fixed=TRUE) *** caught segfault *** address 0x1c09000, cause 'memory not mapped' This should be fixed in latest svn. Thanks for the report. + seth -- Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] segfault in gregexpr()
Hi Herve, Thanks for the report. I can reproduce this with latest R-devel. perl=TRUE is also broken. I have a patch which I am testing. With it, I get: gregexpr(, abc) [[1]] [1] 1 2 3 attr(,match.length) [1] 0 0 0 gregexpr(, abc, fixed=TRUE) [[1]] [1] 1 2 3 attr(,match.length) [1] 0 0 0 gregexpr(, abc, perl=TRUE) [[1]] [1] 1 2 3 attr(,match.length) [1] 0 0 0 + seth -- Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] isOpen on closed connections
Roger D. Peng [EMAIL PROTECTED] writes: As far as I can tell, 'isOpen' cannot return FALSE in the case when 'rw = '. If the connection has already been closed by 'close' or some other function, then isOpen will produce an error. The problem is that when isOpen calls 'getConnection', the connection cannot be found and 'getConnection' produces an error. The check to see if it is open is never actually done. I see this too with R-devel (r43376) {from Nov 6th}. con = file(example1, w) isOpen(con) [1] TRUE showConnections() description class mode text isopen can read can write 3 example1 file w text opened no yes close(con) isOpen(con) Error in isOpen(con) : invalid connection ## printing also fails con Error in summary.connection(x) : invalid connection This came up in some code where I'm trying to clean up connections after successfully opening them. The problem is that if I try to close a connection that has already been closed, I get an error (because 'getConnection' cannot find it). But then there's no way for me to find out if a connection has already been closed. Perhaps there's another approach I should be taking? The context is basically, con - file(foo, w) tryCatch({ ## Do stuff that might fail writeLines(stuff, con) close(con) file.copy(foo, bar) }, finally = { close(con) }) This doesn't address isOpen, but why do you have the call to close inside the tryCatch block? Isn't the idea that finally will always be run and so you can be reasonably sure that close gets called once? If your real world code is more complicated, perhaps you can make use of a work around like: myIsOpen = function(con) tryCatch(isOpen(con), error=function(e) FALSE) You could do similar with myClose and close a connection as many times as you'd like :-) + seth -- Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RSQLite indexing
Jeffrey Horner [EMAIL PROTECTED] writes: Thomas Lumley wrote on 10/22/2007 04:54 PM: I am trying to use RSQLite for storing data and I need to create indexes on two variables in the table. It appears from searching the web that the CREATE INDEX operation in SQLite is relatively slow for large files, and this has been my experience as well. What is your schema? In particular, are things that are integers or floats being stored that way in SQLite? I believe the annotation data packages via AnnotationDbi are using cache_size=64000 and synchronous=0 and that this was determined by a handful of experiments on typical annotation dbs. Columns with few levels may not benefit from an index. See this thread: http://thread.gmane.org/gmane.comp.db.sqlite.general/23683/focus=23693 But your column with many levels should suffer this problem :-) + seth -- Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] warning upon automatic close of connection
Gabor Grothendieck [EMAIL PROTECTED] writes: I noticed that under R 2.6.0 there is a warning about closing the connection in the code from this post: https://stat.ethz.ch/pipermail/r-help/2007-September/140601.html which is evidently related to the following from the NEWS file: o Connections will be closed if there is no R object referring to them. A warning is issued if this is done, either at garbage collection or if all the connection slots are in use. If we use read.table directly it still happens: # use Lines and Lines2 from cited post library(zoo) DF1 - read.table(textConnection(Lines), header = TRUE) DF2 - read.table(textConnection(Lines2), header = TRUE) z1 - zoo(as.matrix(DF1[-1]), as.Date(DF1[,1], %d/%m/%Y)) z2 - zoo(as.matrix(DF2[-1]), as.Date(DF2[,1], %d/%m/%Y)) both - merge(z1, z2) plot(na.approx(both)) R.version.string # Vista [1] R version 2.6.0 alpha (2007-09-06 r42791) Is this annoying warning really necessary? I assume we can get rid of it by explicitly naming and closing the connections but surely there should be a way to avoid the warning without going to those lengths. Up until the change you mention above it really was necessary to name and close all connections. Short scripts run in fresh R sessions may not have had problems with code like you have written above, but longer programs or shorter ones run in a long running R session would run out of connections. Now that connections have weak reference semantics, one can ask whether this behavior should be standard and no warning issued. I would have thought that read.table opens the connection then it would close it itself so no warning would need to be generated. In your example, read.table is _not_ opening the connection. You are passing an open connection which has no symbol bound to it: foo = c = textConnection(foo) c descriptionclass mode text foo textConnection r text opened can readcan write openedyes no But I think passing a closed connection would cause the same sort of issue. It seems that there are two notions of closing a connection: (i) close as the opposite of open, and (ii) clean up the entire connection object. I haven't looked closely at the code here, so I could be wrong, but I'm basing this guess on the following: file(foo) description classmodetext openedcan read foo file r textclosed yes can write yes ## start new R session for (i in 1:75) file(foo) gc() warnings()[1:3] gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 149603 4.0 35 9.4 35 9.4 Vcells 101924 0.8 786432 6.0 486908 3.8 There were 50 or more warnings (use warnings() to see the first 50) warnings()[1:3] $`closing unused connection 76 (foo)` NULL $`closing unused connection 75 (foo)` NULL $`closing unused connection 74 (foo)` NULL -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check recursive copy of tests/
Henrik Bengtsson [EMAIL PROTECTED] writes: intentional I'd say: I did not implement it, but it seems much more logical to keep the previous rule: All *.R files in ./tests/ are run period Subdirectories can be useful for organization, notably storing test data. I don't think it's a good idea to use so very many test files that you need subdirectories, unless maybe you are thinking about unit tests; and then, see below. Examples of subdirectories (some overlapping) are: units/ - tests of minimal code modules integration/ - tests of integrating the above units system/ - real-world scenarios/use cases requirements/ - every requirement should have at least on test. bugs/ - every bug fix should come with a new test. regression/ - every update should have a regression test to validate backward compatibility etc. robustness/ - Testing the robustness of estimators against outliers as well as extreme parameter settings. validation/ - validation of numeric results compared with alternative implementations or summaries. benchmarking/ - actually more measuring time, but can involve validation that a method is faster than an alternative. crossplatform/ - validate correctness across platforms. torture/ - pushing the limits. Those all seem like reasonable examples, but the fact that R CMD check doesn't recurse really isn't a problem. You can have a driver script at the top-level that runs as many of the tests in subdirs as you want. And this is really a good thing since as you mentioned later in your response, some tests take a long time to run and probably are best not automatically run during R CMD check. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check: Error in function (env) : could not find function finalize
Hi Henrik, Henrik Bengtsson [EMAIL PROTECTED] writes: Hi, does someone else get this error message: Error in function (env) : could not find function finalize? I get an error when running examples in R CMD check (v2.6.0; session info below): [snip] The error occurs in R CMD check but also when start a fresh R session and run, in this case, affxparser.Rcheck/affxparser-Ex.R. It always occur on the same line. So does options(error=recover) help in determining where the error is coming from? If you can narrow it down, gctorture may help or running the examples under valgrind. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .Call and to reclaim the memory by allocVector
Hi Yongchao, Yongchao Ge [EMAIL PROTECTED] writes: Why am I storing a large dataset in the R? My program consist of two parts. The first part is to get the intermediate results, the computation of which takes a lot of time. The second part contains many different functions to manipulate the the intermediate results. My current solution is to save intermediate result in a temporary file, but my final goal is to to save it as an R object. The memory leak in .Call stops me from doing this and I'd like to know if I can have a clean solution for the R package I am writing. There are many examples of packages that use .Call to create large objects. I don't think there is a memory leak. One thing that may be catching you up is that because of R's pass-by-value semantics, you may be ending up with multiple copies of the object on the R side during some of your operations. I would recommend recompiling with --enable-memory-profiling and using tracemem() to see if you can identify places where copies of your large object are occurring. You can also take a look at Rprof(memory.profile=TRUE). + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Overriding S4 methods in an installed package
Allen McIntosh [EMAIL PROTECTED] writes: Is it possible to override S4 methods in an installed package? The naive library(pkg) setMethod(foo, signature(obj = bar), function(obj , x, y) { new definition } , where=package:pkg) results in the error Error in setMethod(foo, signature(obj = bar), function(obj, : the environment pkg is locked; cannot assign methods for function foo (This is from R 2.5.1 on Fedora Core 5, if that matters) Background: A colleague claims to have found an error in a package. He and I would prefer to do some experimentation before contacting the authors. Subclassing is the correct way to do this, and I expect we will eventually subclass for other reasons, but I was wondering if an override was possible and easier. If foo is a generic that you are calling directly, then you can probably define it in the global environment (omit the where arg) and test it that way. OTOH, if foo is used by pkg internally, then it will be much easier to simply edit the source for pkg, reinstall and test. If you find and fix a bug, most package maintainers will be quite happy to integrate your fix. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] Suspected memory leak with R v.2.5.x and large matrices with dimnames set
Hi Peter, Peter Waltman [EMAIL PROTECTED] writes: Admittedly, this may not be the most sophisticated memory profiling performed, but when using unix's top command, I'm noticing a notable memory leak when using R with a large matrix that has dimnames set. I'm not sure I understand what you are reporting. One thing to keep in mind is that how memory released by R is handled is OS dependent and one will often observe that after R frees some memory, the OS does not report that amount as now free. Is what you are observing preventing you from getting things done, or just a concern that there is a leak that needs fixing? It is worth noting that the internal handling of character vectors has changed in R-devel and so IMO testing there would make sense before persuing this further, I suspect your results will be different. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] package dependencies
Zhenhuan Cui [EMAIL PROTECTED] writes: I created an add-on R package. In this package, there is a line require(pckgname), because I need to call some functions in pckgname. My package is successfully built and can be successful installed. But R CMD check can not be executed. The error message is: Instead of require(pkgname), simply list pkgname in the Depends field of your package's DESCRIPTION file. See the Writing R Extensions manual for details. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Convert multiple C strings into an R character vector
Jonathan Zhou [EMAIL PROTECTED] writes: I was hoping someone could tell me how to convert multiple C character strings into an R character vector. Here's a quick untested sketch: char **yourStrings; int numStrings = /* the length of yourStrings */; int i; SEXP cvect; PROTECT(cvect = allocVector(STRSXP, numStrings)); for (i = 0; i numStrings; i++) { SET_STRING_ELT(cvect, i, mkChar(yourStrings[i])); } UNPROTECT(cvect); return cvect; + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Using R_MakeExternalPtr
Jonathan Zhou [EMAIL PROTECTED] writes: Hi all, I've been writing a package and I've run into a problem that I'm unsure how to solve. I am looking to pass a C++ class object to R so that it may be passed back to another C++ function later on to be used. I'm quite new to R and this is my first time writing a package, so I hope you can bear with me. The following is how I create the class and use R_MakeExternalPtr(). This occurs in a function called soamInit: Session* sesPtr = conPtr-createSession(attributes); void* temp = session; It isn't clear from your example, are you sure that temp is valid at this point? SEXP out = R_MakeExternalPtr(temp, R_NilValue, R_NilValue); I was expecting to see: SEXP out = R_MakeExternalPtr((void *)sesPtr, R_NilValue, R_NilValue); + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] X11() dies in remote background
Vincent Carey 525-2265 [EMAIL PROTECTED] writes: this is not a problem with R but a request for related advice. i am trying to run a lengthy batch job from my home. the OS is ... Linux jedi.bwh.harvard.edu 2.4.22-openmosix1smp #1 SMP Fri Sep 5 01:05:37 CEST 2003 i686 athlon i386 GNU/Linux i start the job and put it in the background. while i am connected, all is well. eventually my ISP shuts down the connection if i do not do any input. One thing you might try is using screen. The screen program lets you multiplex terminals in a single window, but the feature you want here is that it allows you to detach and reattach to a session. So you could start a screen session at work or home, start something running, detach, and then come back later and attach to see how things are going. However, screen may further complicate your desire to use X11(), but perhaps with Xvfb run from the screen session things will work. Do all of the graphics devices require access to X11()? I thought you could use pdf() for example, without X11() but I'm not certain. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] dict package: dictionary data structure for R
Hi all, The dict package provides a dictionary (hashtable) data structure much like R's built-in environment objects, but with the following differences: - The Dict class can be subclassed. - Four different hashing functions are implemented and the user can specify which to use when creating an instance. I'm sending this here as opposed to R-packages because this package will only be of interest to developers and because I'd like to get feedback from a slightly smaller community before either putting it on CRAN or retiring it to /dev/null. The design makes it fairly easy to add additional hashing functions, although currently this must be done in C. If nothing else, this package should be useful for evaluating hashing functions (see the vignette for some examples). Source: R-2.6.x: http://userprimary.net/software/dict_0.1.0.tar.gz R-2.5.x: http://userprimary.net/software/dict_0.0.4.tar.gz Windows binary: R-2.5.x: http://userprimary.net/software/dict_0.0.4.zip + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] dict package: dictionary data structure for R
Gabor Grothendieck [EMAIL PROTECTED] writes: Although the proto package is not particularly aimed at hashing note that it covers some of the same ground and also is based on a well thought out object model (known as object-based programming or prototype programming). Interesting. The dict package differs from proto in that it _is_ aimed at hashing and: - It is S4 based - It does not use R's environment objects to implement its hashtables (proto uses environments). In Bioconductor, we have many hashtables where the key is an Affymetrix probeset ID. These look sort of like 1000_at. It turns out that the algorithm used by R's environments is not very good at hashing these values. The dict package lets you investigate this: library(dict) keys2 = paste(seq(1000, length=13000), at, sep=_) # here, hash.alg=0L corresponds to the hashing function used by R's # environments. I know, a name would be better. summary(as.integer(table(hashCodes(keys=keys2, hash.alg=0L, size=2^14 Min. 1st Qu. MedianMean 3rd Qu.Max. 80011001500162520252700 # hash.alg=1L is djb2 from here: http://www.cse.yorku.ca/~oz/hash.html summary(as.integer(table(hashCodes(keys=keys2, hash.alg=1L, size=2^14 Min. 1st Qu. MedianMean 3rd Qu.Max. 1.000 1.000 2.000 1.648 2.000 4.000 # and this is what we see with an environment: e = new.env(hash=T, size=2^14) for (k in keys2) e[[k]] = k summary(env.profile(e)$counts) Min. 1st Qu.Median Mean 3rd Qu. Max. 0.0.0.0.79350. 2700. -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 coerce
Paul Gilbert [EMAIL PROTECTED] writes: (I am not sure if this is a bug or a request for a more understandable warning, or possible something obvious I should be posting on r-help.) I am trying to coerce an new class object to be a DBIConnection and it does not work the way I think it should: R version 2.5.1 (2007-06-27) ... require(RMySQL) # or require(RSQLite) Loading required package: RMySQL Loading required package: DBI [1] TRUE m - dbDriver(MySQL) # or m - dbDriver(SQLite) con - dbConnect(m, dbname=test) dbGetQuery(con, create table zzz ( +vintage VARCHAR(20) NOT NULL, +alias VARCHAR(20) default NULL, +Documentation TEXT, +PRIMARY KEY (vintage) +);) NULL dbListTables(con) [1] zzz setClass(TSconnection, representation(con=DBIConnection, +vintage = logical, +panel = logical) +) [1] TSconnection setAs(TSconnection, DBIConnection, def = function(from) [EMAIL PROTECTED]) I think things work as you expect up until this pint. setIs(TSconnection, DBIConnection, coerce = function(x) [EMAIL PROTECTED]) I'm confused about what you want to do here. If you want TSconnection to be a DBIConnection, why wouldn't you use inheritance? setClass(TSconnection, contains=DBIConnection, ...) + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Getting param names of primitives
Prof Brian Ripley [EMAIL PROTECTED] writes: My problem is that if we make formals() work on primitives, people will expect formals(log) - value to work, and it cannot. But it could give an informative error message. Asking for formals() seems to make sense so making it work seems like a good idea. I'll agree that it working might encourage someone to try formals-(), but the fact that it cannot do anything but error seems like a strange reason not to make formals() work. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] step() in sink() and Sweave()
Jari Oksanen [EMAIL PROTECTED] writes: On Wed, 2007-06-13 at 13:33 +0100, Gavin Simpson wrote: Dear Developers, This change also just bit me whilst updating Sweave documents for some computer classes. Is there a work-around that can be employed so that we get both the message() bits and the print() bits in the same place for our Sweave files? If not, is there any point in filing this as a bug in R? I see there have been no (public) responses to Jari's email, yet the change is rather annoying, and I do not see the rationale for printing different parts of the output from step() in two different ways. I think this is a bug. You should not use message() with optional trace. The template for the usage in step() is first if (trace) message() and later if (trace) print() If you specifically request printing setting trace = TRUE, then you should not get message(). Interestingly, message() seems to be a warning() that cannot be suppressed by setting options. message is a condition and so is a warning. This means you have some control over them. For example, you can create a wrapper for step that uses withCallingHandlers to cat out all messages (or print them, or email them to your friends :-) mystep - function(object, scope, scale = 0, direction = c(both, backward, forward), trace = 1, keep = NULL, steps = 1000, k = 2, ...) { withCallingHandlers(step(object=object, scope=scope, scale=scale, direction=direction, trace=trace, keep=keep, steps=steps, k=k, ...), message=function(m) { cat(conditionMessage(m)) }) } This is so annoying that I haven't updated some of my Sweave documents. It is better to have outdated documents than crippled documents. I'm not trying to argue that the function shouldn't change, but if it is so annoying, you can also resolve this problem by defining your own step function and calling it (forgetting about withCallingHandlers). Clearly not ideal, but at the same time in the spirit of open source, no? + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] PATCH: install inst/ before doing lazyload on Windows
Seth Falcon [EMAIL PROTECTED] writes: On Windows, package files in the inst/ subdir are installed after the lazyload creation. This differs from Linux where inst/ is installed _before_ lazyload creation. Since packages may need data in inst, I think the order on Windows should be changed. Perhaps like this: This has been fixed in R devel and patched. Thanks! + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] PATCH: install inst/ before doing lazyload on Windows
Hi, On Windows, package files in the inst/ subdir are installed after the lazyload creation. This differs from Linux where inst/ is installed _before_ lazyload creation. Since packages may need data in inst, I think the order on Windows should be changed. Perhaps like this: diff --git a/src/gnuwin32/MakePkg b/src/gnuwin32/MakePkg index 57af321..868e8f1 100644 --- a/src/gnuwin32/MakePkg +++ b/src/gnuwin32/MakePkg @@ -74,10 +74,10 @@ all: @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s nmspace @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg Dynlib @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s R + @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s $(DPKG)/demo $(DPKG)/exec $(DPKG)/inst $(DATA) ifeq ($(strip $(LAZY)),true) @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s lazyload endif - @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s $(DPKG)/demo $(DPKG)/exec $(DPKG)/inst $(DATA) ifeq ($(strip $(LAZYDATA)),true) @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s lazydata endif -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] package check note: no visible global function definition (in functions using Tcl/Tk)
Prof Brian Ripley [EMAIL PROTECTED] writes: It seems that is happens if package tcltk is missing from the Depends: list in the DESCRIPTION file. I just tested with Amelia and homals and that solved the various warnings in both cases. Adding tcltk to Depends may not always be the desried solution. If tcltk is already in Suggests, for example, and the intention is to optionally provide GUI features, then the code may be correct as-is. That is, codetools will issue the NOTEs if you have a function that looks like: f - function() { if (require(tckltk)) { someTckltkFunctionHere() } else otherwiseFunction() } } There are a number of packages in the BioC repository that provide such optional features (not just for tcltk) and it would be nice to have a way of declaring the use such that the NOTE is silenced. [Note 1: I don't have any ideas at the moment for how this could work.] [Note 2: Despite the false-positives, I've already caught a handful of bugs by reading over these NOTEs and think they provide a lot of value to the check process] + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] missing IntegerFromString()
Aniko Szabo [EMAIL PROTECTED] writes: I am sorry about the confusion, I was too hasty. asInteger(coerceVector(x,INTSXP)) does not work after all. Here are more details of what I am trying to accomplish: I have a matrix with column names that are actually known to be integers (because I set them so myself in the R code, say, colnames(mat) - 1:10. Of course, they become converted to character strings.) The relevant part of my code used to be: SEXP MyFunction(SEXP mat); int warn, minY SEXP rl, cl; char *rn, *cn; GetMatrixDimnames(mat, rl, cl, rn, cn); minY = IntegerFromString(VECTOR_ELT(cl,0), warn); if (warn 0) error(Names of popmatrix columns are not integers); Running some tests it appears that VECTOR_ELT(cl,0) is CHARSXP (which I wound up using without even knowing it). I tried replacing the IntegerFromString part with both asInteger(VECTOR_ELT(cl,0)) and with asInteger(coerceVector(VECTOR_ELT(cl,0),INTSXP)), but as you surmised, since VECTOR_ELT(cl,0) is CHARSXP, it does not work. So, how could I get the actual values in the column names? How about: SEXP colnums; int *ivals; PROTECT(colnums = coerceVector(cl, INTSXP)); ivals = INTEGER(colnums); Here you convert the STRSXP cl into an INTSXP. If you want the actual integer values, use the ivals pointer. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] HTML vignette browser
Friedrich Leisch [EMAIL PROTECTED] writes: Looks good to me, and certainly something worth being added to R. 2 quick (related) comments: 1) I am not sure if we want to include links to the Latex-Sources by default, those might confuse unsuspecting novices a lot. Perhaps make those optional using an argument to browseVignettes(), which is FALSE by default? I agree that the Rnw could confuse folks. But I'm not sure it needs to be hidden or turned off by default... If the .R file was also included then it would be less confusing I suspect as the curious could deduce what Rnw is about by triangulation. 2) Instead links to .Rnw files we may want to include links to the R code - should we R CMD INSTALL a tangled version of each vignette such that we can link to it? Of course it is redundant information given the .Rnw, but we also have the help pages in several formats ready. Including, by default, links to the tangled .R code seems like a really nice idea. I think a lot of users who find vignettes don't realize that all of the code used to generate the entire document is available to them -- I just had a question from someone who wanted to know how to make a plot that appeared in a vignette, for example. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible changes to connections
mel writes: There could be/was the same debate in C/C++. That's may be just a matter of education about not forgetting to close previously opened doors ! R is not C/C++. In general, one does not expect to explicitly handle memory allocation and release when programming in R. Treating connections differently, when there is no longer any technical reason to do so, is surprising. Prof Brian Ripley [EMAIL PROTECTED] writes: When I ran some tests I found 7 packages on CRAN that in their tests were not closing connections. Four of those are maintained by R-core members. Even though none were by me, I think this is too easy to forget to do! I agree that it is easy to forget. It is especially easy if one creates so-called anonymous connection references like readLines(file(path)) -- this anonymous idiom seems nature to me when coding R and it would be nice to make it work for connections. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible changes to connections
Hi, One more comment on this thread... Jeffrey Horner [EMAIL PROTECTED] writes: Prof Brian Ripley wrote: When I originally implemented connections in R 1.2.0, I followed the model in the 'Green Book' closely. There were a number of features that forced a particular implementation, and one was getConnection() that allows one to recreate a connection object from a number. [...] Another issue is that the current connection objects can be saved and restored but refer to a global table that is session-specific so they lose their meaning (and perhaps gain an unintended one). What I suspect is that very few users are aware of the Green Book description and so we have freedom to make some substantial changes to the implementation. Both issues suggest that connection objects should be based on external pointers (which did not exist way back in 1.2.0). Sounds great! I would also like to see the following interface (all or in parts) added for working with connections from C. This is an update to the patch I created here: http://wiki.r-project.org/rwiki/doku.php?id=developers:r_connections_api I wanted to voice a me too for wanting to see an interface added for working with connections from C in package code. There are a number of places where this would be useful and provide cleaner solution than what is possible today. The proposed interface looks useful. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 assignment \alias and \usage
Paul Gilbert [EMAIL PROTECTED] writes: What is the Rd file alias and usage syntax for an S4 assignment method? I have been trying variations on \alias{TSdoc-,default-method} \usage{ \S4method{TSdoc}{default}(x) - value but so far I have not got it right according to various codoc, etc, checks. If you have your own generic TSdoc-, then I think you want: \alias{TSdoc-} \alias{TSdoc-,someClass,anotherClass-method} You may not be allowed to specify usage, but I think the issue only arises when setting methods for a generic documented elsewhere. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 assignment \alias and \usage
Paul Gilbert [EMAIL PROTECTED] writes: Let me back up a bit, I may be making another mistake. My code has setGeneric(TSdoc-, def= function(x, value) standardGeneric(TSdoc-), useAsDefault= function (x, value) {attr(x, TSdoc) - value ; x }) setGeneric(TSdoc, def= function(x) standardGeneric(TSdoc), useAsDefault= function(x) attr(x, TSdoc)) Aside: It seems odd to me to define such defaults. How do you know x is going to have a TSdoc attribute? -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible changes to connections
Prof Brian Ripley [EMAIL PROTECTED] writes: When I originally implemented connections in R 1.2.0, I followed the model in the 'Green Book' closely. There were a number of features that forced a particular implementation, and one was getConnection() that allows one to recreate a connection object from a number. I am wondering if anyone makes use of this, and if so for what? I don't see any uses of it in the Bioconductor package sources. It would seem closer to the R philosophy to have connection objects that get garbage collected when no R object refers to them. This would allow for example readLines(gzfile(foo.gz)) I think this would be a nice improvement as it matches what many people already assume happens as well as matches what some other languages do (in particular, Python). + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Recent changes in R related to CHARSXPs
Hello all, I want to highlight a recent change in R-devel to the larger developeR community. As of r41495, R maintains a global cache of CHARSXPs such that each unique string is stored only once in memory. For many common use cases, such as dimnames of matrices and keys in environments, the result is a significant savings in memory (and time under some circumstances). A result of these changes is that CHARSXPs must be treated as read only objects and must never be modified in-place by assigning to the char* returned by CHAR(). If you maintain a package that manipulates CHARSXPs, you should check to see if you make such in-place modifications. If you do, the general solution is as follows: If you need a temp char buffer, you can allocate one with a new helper macro like this: /* CallocCharBuf takes care of the +1 for the \0, so the size argument is the length of your string. */ char *tmp = CallocCharBuf(n); /* manipulate tmp */ SEXP schar = mkChar(tmp); Free(tmp); You can also use R_alloc which has the advantage of not having to free it in a .Call function. The mkChar function now consults the global CHARSXP cache and will return an already existing CHARSXP if one with a matching string exists. Otherwise, it will create a new one and add it to the cache before returning it. In a discussion with Herve Pages, he suggested that the return type of CHAR(), at least for package code, be modified from (char *) to (const char *). I think this is an excellent suggestion because it will allow the compiler to alert us to package C code that might be modifying CHARSXPs in-place. This hasn't happened yet, but I'm hoping that a patch for this will be applied soon (unless better suggestions for improvement arise through this discussion :-) One other thing is worth mentioning: at present, not all CHARSXPs are captured by the cache. I think the goal is to refine things so that all CHARSXPs _are_ in the cache. At that point, strcmp calls can be replaced with pointer comparisons which should provide some nice speed ups. So part of the idea is that the way to get CHARSXPs is via mkChar or mkString and that one should not use allocString, etc. Finally, here is a comparison of time and memory for loading all the environments (hash tables) in Bioconductor's GO annotation data package. ## unpatched gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 168891 9.1 35 18.7 35 18.7 Vcells 115731 0.9 786432 6.0 425918 3.3 library(GO) system.time(for (e in ls(2)) get(e)) user system elapsed 51.919 1.168 53.228 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 17879072 954.9 19658017 1049.9 18683826 997.9 Vcells 31702823 241.9 75190268 573.7 53912452 411.4 ## patched gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 154717 8.3 35 18.7 35 18.7 Vcells 133613 1.1 786432 6.0 483138 3.7 library(GO) system.time(for (e in ls(2)) get(e)) user system elapsed 31.166 0.736 31.998 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 5837253 311.86910418 369.1 6193578 330.8 Vcells 16831859 128.5 45712717 348.8 39456690 301.1 Best Wishes, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] relist, an inverse operator to unlist
Andrew Clausen [EMAIL PROTECTED] writes: Hi Seth, On Mon, May 21, 2007 at 05:15:10PM -0700, Seth Falcon wrote: I will also add that the notion of a default argument on a generic function seems a bit odd to me. If an argument is available for dispatch, I just don't see what sense it makes to have a default. In those cases, the default should be handled by the method that has a signature with said argument matching the missing class. What often does make sense is to define a generic function where some argument are not available for dispatch. For example: setGeneric(foo, signature=flesh, function(flesh, skeleton=attr(flesh, skeleton) standardGeneric(foo))) That's an excellent suggestion. Thanks! However, I had to set the signature to c(numeric, missing) rather than just numeric. I have uploaded a new version here: http://www.econ.upenn.edu/~clausen/computing/relist.R I misunderstood. You aren't using S4 classes/methods at all and so I don't actually see how my comments could have been helpful in any way. relist seems like a really odd solution to me, but based on the discussion I guess it has its use cases. Best, + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: adding an 'exact' argument to [[
Hi again, Robert has committed the proposed patch to R-devel. So [[ now has an 'exact' argument and the behavior is as described: Seth Falcon [EMAIL PROTECTED] writes: 1. [[ gains an 'exact' argument with default value NA 2. Behavior of 'exact' argument: exact=NA partial matching is performed as usual, however, a warning will be issued when a partial match occurs. This is the default. exact=TRUE no partial matching is performed. exact=FALSE partial matching is allowed and no warning issued if it occurs. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Passing R CMD Check without data
Arjun Ravi Narayan [EMAIL PROTECTED] writes: I have a package which passes R CMD check with the --no-vignettes option. However, it does not pass the check without that, as the vignette relies on some data files that I cannot distribute. However, I would like the package to pass the check so that I can put it on CRAN, so that other people with access to the dataset can put the data into the package, and then rebuild the vignettes themselves. I would recommend having a separate vignette that uses toy data, if that is all that is available, that demonstrates the basic use of the package. A consider part of the value of a package vignette, IMHO, is having something that (i) the user can run interactively on their own, and (ii) can be automatically checked. Your current vignette can be included as pdf (the Rnw could live in another place under inst/). You might also look at the vsn package in Bioconductor which uses a Makefile to avoid R CMD check from building its vignette because it is too time consuming... + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] relist, an inverse operator to unlist
Hi Andrew, Andrew Clausen [EMAIL PROTECTED] writes: For reasons I can't explain, the code I posted worked in my session, but didn't work when I started a fresh one. standardGeneric() seems to get confused by defaults for missing arguments. It looks for a missing method with this code: relist - function(flesh, skeleton=attr(flesh, skeleton)) { standardGeneric(relist) } This looks very odd to me. If you are creating an S4 generic function, why are you not calling setGeneric? Or has that part of the code simply been omitted from your post? I will also add that the notion of a default argument on a generic function seems a bit odd to me. If an argument is available for dispatch, I just don't see what sense it makes to have a default. In those cases, the default should be handled by the method that has a signature with said argument matching the missing class. What often does make sense is to define a generic function where some argument are not available for dispatch. For example: setGeneric(foo, signature=flesh, function(flesh, skeleton=attr(flesh, skeleton) standardGeneric(foo))) + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: adding an 'exact' argument to [[
Bill Dunlap [EMAIL PROTECTED] writes: This sounds interesting. Do you intend to leave the $ operator alone, so it will continue to do partial matching? I suspect that that is where the majority of partial matching for list names is done. The current proposal will not touch $. I agree that most intentional partial matching uses $ (hopefully only during interactive sessions). The main benefit of the our proposed change is more reliable package code. For long lists and certain patterns of use, there are also performance benefits: kk - paste(abc, 1:(1e6), sep=) vv = as.list(1:(1e6)) names(vv) = kk system.time(vv[[fooo, exact=FALSE]]) user system elapsed 0.074 0.000 0.074 system.time(vv[[fooo, exact=TRUE]]) user system elapsed 0.042 0.000 0.042 It might be nice to have an option that made x$partial warn so we would fix code that relied on partial matching, but that is lower priority. I think that could be useful as well. To digress a bit further in discussing $... I think the argument that partial matching is desirable because it saves typing during interactive sessions now has a lot less weight. The recent integration of the completion code gives less typing and complete names. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: allow packages to advertise vignettes on Windows
Simon Urbanek [EMAIL PROTECTED] writes: Seth, we already *have* vignette registration in place [vignette()] and we already *have* support in the GUIs (I'm talking e.g. about the Mac GUI here which uses vignette() to build a vignettes browser). Yes, fine. I agree that vignette() provides most of what is needed in terms of the implementationn details and could replace most of the code that I posted, but it doesn't mean there is nothing to do. What you propose circumvents all the mechanisms already in place and adds replicates the same functionality. I'll repeat my question: what is wrong with the current approach? Why do you want to add a parallel approach? What is wrong with the current approach is that, at least on Windows, vignettes are not as easily accessible they should be. vignette() is fine as an implementation detail for GUI developers. It is a bit silly for beginning users who will have a much better chance of getting to such introductory documentation if it is part of the GUI. Gaah, I feel like a broken record. What we have had with the code in Biobase is a menu of vignettes for _attached_ packages. Given the total number of packages that could be installed and given the fact that running code in a vignette requires said package to be attached, I think this makes a lot of sense [And I think this would improve the usability of the OS X vignette browser because the list is long, the vignettes for an individual package are not sensibly ordered, etc]. A menu is not perfect, but limiting to attached packages makes it a useful solution until more robust browsers etc get to the top of someones TODO list. But YMMV and what I've proposed does not require your OS X GUI to change _anything_. So, as a small step, I'm trying to get vignettes for attached packages to be easily accessible via the Windows GUI. I don't care all that much about the particulars -- and am certainly not attached to the code that I posted. What the vignette() function does not provide for is a hook such that a GUI can add the vignette info for attached packages. Comments from others in this thread suggest that there is a desire that this be an opt-in feature for package authors [I don't really understand this desire as it seems to me it should be a feature/decision of the GUI] and again vignette() doesn't help. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] RFC: allow packages to advertise vignettes on Windows
Hello, The vignette concept, which started in Bioconductor, seems to be catching on. They are supported by R CMD build/check and documented in the Writing R Extensions manual. I think vignettes are a fantastic way to introduce new users to a package. However, getting new users to realize that a vignette is available can be challenging. For some time now, we have had a function in Biobase that creates a Vignettes menu item in the R Windows GUI and gives packages a mechanism to register their vignettes so that they appear on this menu. I would like to see this functionality included in R so that there can be a standard mechanism that doesn't depend on Biobase of registering a package's vignettes with one of the R GUIs (currently only Windows is supported, but I imagine the OS X GUI could also implement this). Below is the implementation we have been using. Is there an R-core member I can interest in pushing this along? I'm willing to submit a patch with documentation, etc. + seth addVigs2WinMenu - function(pkgName) { if ((.Platform$OS.type == windows) (.Platform$GUI == Rgui) interactive()) { vigFile - system.file(Meta, vignette.rds, package=pkgName) if (!file.exists(vigFile)) { warning(sprintf(%s contains no vignette, nothing is added to the menu bar, pkgName)) } else { vigMtrx - .readRDS(vigFile) vigs - file.path(.find.package(pkgName), doc, vigMtrx[,PDF]) names(vigs) - vigMtrx[,Title] if (!Vignettes %in% winMenuNames()) winMenuAdd(Vignettes) pkgMenu - paste(Vignettes, pkgName, sep=/) winMenuAdd(pkgMenu) for (i in vigs) { item - sub(.pdf, , basename(i)) winMenuAddItem(pkgMenu, item, paste(shell.exec(\, as.character(i), \), sep = )) } } ## else ans - TRUE } else { ans - FALSE } ans } -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: allow packages to advertise vignettes on Windows
Duncan Murdoch [EMAIL PROTECTED] writes: I'm interested in making vignettes more visible. Putting them on the menu is not the only way, but since you're offering to do the work, I think it's a good idea :-). Excellent :-) A few questions: - Should packages need to take any action to register their vignettes, or should this happen automatically for anything that the vignette() function would recognize as a vignette? My recommendation would be for automatic installation. That seems ok to me. Currently, we have a system that requires package authors to register their vignette in .onAttach (more on that below). I can't really think of a case where a package provides vignettes and doesn't want them easily accessible to new users in a GUI environment. - Should it happen when the package is installed or when it is attached? This is harder. vignette() detects installed vignettes, which is fine if not many packages have them. But I think the hope is that most packages will eventually, and then I think you wouldn't want the menu to list every package. Maybe default to attached packages, but expose the function below for people who want more? My feeling is that this is only appropriate for attached packages. As you point out, adding an entry for every installed package could create a cluttered menu (and present implementation challenges to avoid slowness). I also think that packages that get loaded via other packages name spaces should remain in stealth mode. There is another reason to only list vignettes for attached packages. One of the primary uses of a vignette is to allow the user to work through an example use case interactively. This requires the package to be attached in almost all cases. - Should they appear in a top level Vignettes menu, or as a submenu of the Help menu? I'd lean towards keeping the top level placement, since you've already got an audience who are used to that. Sounds good. By the way, another way to expose vignettes is to have them automatically added to the package help topic, with links in formats that support them. I think we should do that too, but I don't know if it'll happen soon. Also sounds good, but one thing at a time, I guess. If there is some agreement about vignettes being automatically added and that this only happens when a package is attached, then I can look into modifying the existing function to handle this. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Problem calling $ inside a $ method
Hello, I wonder if this will make it through the spam filters given the subject line. I'm seeing the following when trying to call a dollar method inside of a dollar method. setClass(Foo, representation(d=list)) [1] Foo f - new(Foo, d=list(bob=1, alice=2)) ## We can call dollar at this level and it works as expected `$`(f, bo) [1] 1 `$`(f, al) [1] 2 ## So set a method on Foo that does this setMethod($, Foo, function(x, name) `$`([EMAIL PROTECTED], name)) [1] $ ## But it doesn't work. Why? f$bo NULL f$al NULL ## Here is a hackish workaround. setMethod($, Foo, function(x, name) eval(substitute([EMAIL PROTECTED], list(FOO=name [1] $ f$bo [1] 1 f$al [1] 2 Other suggestions for workarounds? Is this a bug? + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible problem with S4 dispatch
Prof Brian Ripley [EMAIL PROTECTED] writes: Note that you called selectMethod(mget, signature(x=character, envir=class(LLe))) by name rather than calling the visible function mget() (which you could have supplied as fdef). I've never really got to the bottom of the complicated searches that getGeneric() uses, but the fact that it does not just look for a visible function of that name tells you it is doing something different. What I would check from your browser is what parent.env() shows, successively until you get to the imports and then the base namespace. If mget is not in the imports, something would seem to be up with your importing of namespaces. find() is not relevant here as namespace scoping is in play: only if the mget generic is imported will it take precedence over base:::mget. (It is not clear to me what is being browsed here, and hence what namespaces are in play.) This was helpful. It seems that the strange behavior I was seeing was due to stale package installations. After reinstalling the package and all of its depends and imports, things are looking more normal. I used the following function to examine the chain of parent environments while debugging: showEncEnvs - function() { etmp - parent.env(parent.frame()) while (TRUE) { ename - environmentName(etmp) cat(sprintf(Found envirnment: '%s'\n, ename)) if (exists(mget, etmp, inherits=FALSE)) cat(found mget\n) switch(ename, R_EmptyEnv=break, R_GlobalEnv=break) if (ename == ) { cat( first five entires\n) print(ls(etmp)[1:5]) } etmp - parent.env(etmp) } } One thing to note: One might expect each import to be in the chain of parent environments. Instead all imports are merged into a single environment that is the parent of the package env. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel