from:"Seth Falcon"

Re: [Rd] SVN vs DVCS

2010-05-26 Thread Seth Falcon


On 5/26/10 4:16 AM, Gabor Grothendieck wrote:

Note that one can also use any of the dvcs systems without actually
moving from svn by using the dvcs (or associated extension/addon) as
an svn client or by using it on an svn checkout.


FWIW, I have been using git for several years now as my vsc of choice 
and use it for all svn-backed projects (R included) via git-svn.


Some of the things I like:

- Being able to organize changes in local commits that can be revised, 
reordered, rebased prior to publishing.  Once I got in the habit of 
working this way, I simply can't imagine going back.


- Having quick access to full repository history without network 
access/delay.  Features for searching change history are more powerful 
(or easier for me to use) and I have found that useful as well.


- This may not be true any longer with more recent svn servers/clients, 
but aside form the initial repo clone, working via git-svn was 
noticeably faster than straight svn client (!) -- I think related to how 
the tools organize the working copy and how many fstat calls they make.


- I find the log reviewing functionality much better suited to reviewing 
changes.



+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Resolving functions using R's namespace mechanism can double runtime

2010-04-27 Thread Seth Falcon


On 4/27/10 1:16 PM, Dominick Samperi wrote:

It appears that the runtime for an R script can more than double if a few
references to a function foo() are replaced by more explict references
of the form pkgname::foo().

The more explicit references are of course required when two
loaded packages define the same function.

I can understand why use of this mechanism is not free in an
interpreted environment like R, but the cost seems rather high.


`::` is a function, so there is going to be overhead.  OTOH, there is no 
reason to pay for the lookup more than once.  For example at startup, 
you could do:


myfoo - pkgname::foo

And then later call myfoo() and I don't think you will see the added cost.

You can formalize the above approach in package code by renaming 
function in the importFrom directive where I believe you can do:


importFrom(pkgname, myfoo=foo)


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Seth Falcon


On 4/21/10 10:45 AM, Simon Urbanek wrote:

Won't that miss the last incomplete chunk? (and please don't use
DATAPTR on INTSXP even though the effect is currently the same)

In general it seems that the it depends on nt whether this is
efficient or not since calls to short memcpy are expensive (very
small nt that is).

I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
and the results were encouraging - depending on the size of the
copied block the difference could be quite big: tiny block (ca. n =
32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
faster as the size increases the gap closes (presumably due to RAM
bandwidth limitations) so for n = 512M it is ~30%.




Of course this is contingent on the implementation of memcpy,
compiler, architecture etc. And will only matter if copying is what
you do most of the time ...


Copying of vectors is something that I would expect to happen fairly 
often in many applications of R.


Is for() faster on small blocks by enough that one would want to branch 
based on size?


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] transient memory allocation and external pointers

2010-04-20 Thread Seth Falcon


On 4/20/10 6:24 AM, Melissa Jane Hubisz wrote:

Thanks for the responses.  Seth's example is indeed what I was trying
(hoping) to do, it seems to work on my system fine (ubuntu x86_64, R
2.10.1).  But if it doesn't work for him, then that definitely answers
my question.  I guess I'll have to go the Calloc/Free route.


I expect that you could get your approach to not work on your system as 
well, you just have to try harder ;-)


Memory related bugs can be quite tricky, because incorrect code may run 
fine most of the time.  To trigger a problem, you need to have the right 
pattern of allocation such that data will be written over the memory 
that your invalid external pointer points to.


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] transient memory allocation and external pointers

2010-04-19 Thread Seth Falcon


On 4/19/10 8:59 AM, Simon Urbanek wrote:


On Apr 19, 2010, at 10:39 AM, Melissa Jane Hubisz wrote:


Hello,
The Writing R extensions manual section 6.1.1 describes the transient
memory allocation function R_alloc, and states that memory allocated
by R_alloc is automatically freed after the .C or .Call function is
completed.  However, based on my understanding of R's memory handling,
as well as some test functions I have written, I suspect that this is
not quite accurate.  If the .Call function returns an external pointer
to something created with R_alloc, then this object seems to stick
around after the .Call function is completed, and is subject to
garbage collection once the external pointer object is removed.






Yes, because the regular rules for the lifetime of an R object apply
since it is in fact an R object. It is subject to garbage collection
so if you assign it anywhere its lifetime will be tied to that object
(in your example EXTPTRSXP).


I may be misunderstanding the question, but I think the answer is 
actually that it is *not* safe to put memory allocated via R_alloc into 
the external pointer address of an EXTPTRSXP.


Here's what I think Melissa is doing:

SEXP make_test_xp(SEXP s)
{
SEXP ans;
const char *s0 = CHAR(STRING_ELT(s, 0));
char *buf = (char *)R_alloc(strlen(s0) + 1, sizeof(char));
memcpy(buf, s0, strlen(s0) + 1);
ans = R_MakeExternalPtr(buf, R_NilValue, R_NilValue);
return ans;
}

The memory allocated by R_alloc is released at the end of the .Call 
via vmaxset(vmax).  Using R_alloc in this way will lead to memory 
corruption (it does for me when I made a simple test case).


For memory that really is external (not SEXP), then you should instead 
use Calloc and register a finalizer for the external pointer that will 
do any required cleanup and then call Free.


If instead you want to have an externally managed SEXP, you could put it 
in the protected slot of the external pointer, but then you should 
allocate it using standard R allocation functions.




+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] generic '[' for a non-exported class

2010-04-07 Thread Seth Falcon


On 4/7/10 1:09 AM, Christophe Genolini wrote:

Hi all,
I define a S4 class 'foo'. I define '[' and '[-' for it.
I do not want to export foo, so I do not put it in NAMESPACE.
I do not want to export '[' and '[-' either (since the user can not use
foo, no raison to give him access to '[' for foo).

But R CMD check does not agree with me and report an error:
Undocumented S4 methods:
generic '[' and siglist 'foo'
generic '[-' and siglist 'foo'


Any solution ?


You can document these on an internal API Rd page.  Create an Rd file 
like yourPkg-internal-api.Rd and add the appropriate \alias{} lines to it.


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] as(1:4, numeric) versus as.numeric(1:4, numeric)

2010-04-01 Thread Seth Falcon


On 3/31/10 4:52 PM, John Chambers wrote:

The example is confusing and debatable, but not an obvious bug.  And
your presentation of it is the cause of much of the confusion
(unintentionally I'm sure).


To restate the issue (I think):

In a new R session if you happen to call:

  selectMethod(coerce, c(integer, numeric))

*Before* having made a call like as(1:4, numeric) then there is a 
side-effect of creating definition A of the integer = numeric coerce 
method.  From this point forward all calls to as(x, numeric) when x is 
integer will return as.numeric(x).


If instead you do not call selectMethod, then when calling as(x, 
numeric) for x integer you get definition B, the documented 
behavior, which simply returns x.


Presumably there are other similar cases where this will be an issue.

So while I agree this could be considered obscure, this qualifies as a 
bug in my book.  It seems desirable that selectMethod not change the 
state of the system in a user-visible fashion.  And calling 
selectMethod, or any other function, should not alter dispatch unless 
documented to do so.


I'm also suspicious of the behavior of the strict argument:

 class(as(1:4, numeric))
[1] integer
 class(as(1:4, numeric, strict = TRUE))
[1] integer
 class(as(1:4, numeric, strict = FALSE))
[1] integer

Is that intended?

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Difference Linux / Windows

2010-03-31 Thread Seth Falcon


On 3/31/10 1:12 PM, Christophe Genolini wrote:

Hi the list,
I am writing a package that happen to not be compatible with linux
because I did not know that the function savePlot was available only
on windows. Is there a list of incompatible function? How can I get
this kind of information?


One way is to obtain a copy of the R sources and then grep the Rd files 
for '#ifdef'.


I don't claim this is convenient.

There has been discussion, and I believe general consensus, that we'd 
like to eliminate the conditional documentation.  This requires editing 
the Rd files to make the contents sensible (you can't just remove the 
#ifdef's).  Patches along these lines would be welcome.


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] update.packages(1)

2010-03-27 Thread Seth Falcon

On 3/27/10 1:43 PM, Duncan Murdoch wrote:
 On 25/03/2010 3:16 PM, Arni Magnusson wrote:
 I'm relaying a question from my institute's sysadmin:

 Would it be possible to modify update.packages() and related functions
 so that 'lib.loc' accepts integer values to specify a library from the
 .libPaths() vector?

 Many Linux users want to update all user packages (inside the
 R_LIBS_USER directory, e.g. ~/r/library) and none of the system
 packages (inside the /usr directory, e.g. /usr/lib64/R/library),
 because they don't have write privileges to update the system packages.

 Currently, this can be done by pressing 'y RET' for all the user
 packages and 'RET' for all the system packages. This hard work and
 careful reading when there dozens of packages. Another way is to run

update.packages(Sys.getenv(R_LIBS_USER))

 or:

update.packages(.libPaths()[1])
 
 You could also save some work by putting ask=FALSE, or ask=graphics in
 as another argument.  But isn't it easy enough to write your own
 function as a wrapper to update.packages, suiting your own local
 conventions?   It seems like a bad idea to make update.packages too
 friendly, when there are several different friendly front-ends for it
 already (e.g. the menu entries in Windows or MacOS GUIs).
 
 But it would be nicer for the user to type

update.packages(1)

 using a 'pos' like notation to indicate the first element of the
 .libPaths() vector.

 ---

 A separate but related issue is that it would be nice if the
 R_LIBS_USER library would be the first library by default. Currently,
 my sysadmin must use Rprofile.site to shuffle the .libPaths() to make
 R_LIBS_USER first, which seems like a sensible default when it comes
 to install.packages() and remove.packages().

I'm confused.  AFAICT, R_LIBS_USER _is_ put first.  Following the advice
in the Admin manual, I created a directory matching the default value of
R_LIBS_USER (Sys.getenv(R_LIBS_USER) to see it).  Then when I start R,
I get:

 .libPaths()
[1] /home/sfalcon/R/x86_64-unknown-linux-gnu-library/2.11
[2] /home/sfalcon/build/rd/library

Isn't that what you want?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] list_files() memory corruption?

2010-03-22 Thread Seth Falcon


On 3/20/10 2:03 PM, Seth Falcon wrote:

On 3/20/10 1:36 PM, Alistair Gee wrote:

I fixed my build problems. I also noticed that my patch wasn't
correct, so I have attached a new version.

This fix still grows the vector by doubling it until it is big enough,
but the length is reset to the correct size at the end once it is
known.

This fix differs from the existing fix in subversion in the following scenario:

1.Create file Z in directory with 1 other file named Y
2. Call dir() to retrieve list of files.
3. dir() counts 2 files.
4. While dir() is executing, some other process creates file X in the directory.
5. dir() retrieves the list of files, stopping after 2 files. But by
chance, it retrieves files X and Y (but not Z).
6. dir() returns files X and Y, which could be misinterpreted to mean
that file Z does not exist.

In contrast, with the attached fix, dir() would return all 3 files.


I think the scenario you describe could happen with either version.
Once you've read the files in a directory, all bets are off.  Anything
could happen between the time you readdir() and return results back to
the user.  I agree, though, that avoiding two calls to readdir narrows
the window.


Also, the existing fix in subversion doesn't seem to handle the case
where readdir() returns fewer files than was originally counted as it
doesn't decrease the length of the vector.


Yes, that's a limitation of the current fix.

Have you run 'make check-devel' with your patch applied?  Have you run
any simple tests for using dir() or list.files() with recursive=TRUE on
a reasonably large directory and compared times and memory use reported
by gc()?

It is often the case that writing the patch is the easy/quick part and
making sure that one hasn't introduced new infelicities or unintended
behavior is the hard part.

I will try to take another look at your latest patch.


I've applied a modified version of your patch.  In the testing that I 
did, avoiding the counting step resulted in almost 2x faster times for 
large directory listings with recursive=TRUE at the cost of a bit more 
memory.


The code also now includes a check for user interrupt, so that you can 
C-c out of dir/list.files call more quickly.


Thanks for putting together the patch.

+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Suggestion: Not having to export .conflicts.OK in name spaces

2010-03-22 Thread Seth Falcon


On 3/22/10 3:57 AM, Martin Maechler wrote:

SF == Seth Falcons...@userprimary.net
 on Fri, 19 Mar 2010 13:47:17 -0700 writes:


 SF  On 3/17/10 9:11 AM, Henrik Bengtsson wrote:
   Currently library() and attach() fail to locate an
   existing '.conflicts.OK' in a package wit name space,
   unless it is exported.  Since there should be little
   interest in exporting '.conflicts.OK' otherwise, one may
   argue that those methods should look for '.conflicts.OK'
   even if it is not exported.

 SF  I guess I agree that there is no real value in forcing
 SF  .conflicts.OK to be exported.
so do I.


So I guess we agree that Henrik's patch would be worth applying.

@Henrik: if you resend your patch with the additions for attach, I will 
see about putting it in.




 SF  OTOH, this seems like a dubious feature to begin.  When
 SF  is it a good idea to use it?

in cases, the package author thinks (s)he knows what (s)he is
doing;
e.g. in the case of Matrix, I could argue that I know about the
current conflicts, and I would *not* want the users of my
package be intimidated by warnings about maskings...


I can't say that this convinces me that .conflicts.OK is OK.  Are there 
package authors who realize they do not know what they are doing enough 
to keep the warning messages :-P


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] list_files() memory corruption?

2010-03-20 Thread Seth Falcon

On 3/20/10 1:36 PM, Alistair Gee wrote:
 I fixed my build problems. I also noticed that my patch wasn't
 correct, so I have attached a new version.
 
 This fix still grows the vector by doubling it until it is big enough,
 but the length is reset to the correct size at the end once it is
 known.
 
 This fix differs from the existing fix in subversion in the following 
 scenario:
 
 1.Create file Z in directory with 1 other file named Y
 2. Call dir() to retrieve list of files.
 3. dir() counts 2 files.
 4. While dir() is executing, some other process creates file X in the 
 directory.
 5. dir() retrieves the list of files, stopping after 2 files. But by
 chance, it retrieves files X and Y (but not Z).
 6. dir() returns files X and Y, which could be misinterpreted to mean
 that file Z does not exist.
 
 In contrast, with the attached fix, dir() would return all 3 files.

I think the scenario you describe could happen with either version.
Once you've read the files in a directory, all bets are off.  Anything
could happen between the time you readdir() and return results back to
the user.  I agree, though, that avoiding two calls to readdir narrows
the window.

 Also, the existing fix in subversion doesn't seem to handle the case
 where readdir() returns fewer files than was originally counted as it
 doesn't decrease the length of the vector.

Yes, that's a limitation of the current fix.

Have you run 'make check-devel' with your patch applied?  Have you run
any simple tests for using dir() or list.files() with recursive=TRUE on
a reasonably large directory and compared times and memory use reported
by gc()?

It is often the case that writing the patch is the easy/quick part and
making sure that one hasn't introduced new infelicities or unintended
behavior is the hard part.

I will try to take another look at your latest patch.

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] DESCRIPTION: Imports: assertion of version?

2010-03-19 Thread Seth Falcon


On 3/19/10 6:13 AM, Henrik Bengtsson wrote:

Hi,

from 'Writing R Extensions' [R version 2.11.0 Under development
(unstable) (2010-03-16 r51290)] one can read:

The optional `Imports' field lists packages whose name spaces are
imported from but which do not need to be attached. [...] Versions can
be specified, but will not be checked when the namespace is loaded.

Is it a design decision that version specifications are not asserted
for packages under Imports:, or is it a lack of implementation?
If a design decision, under what use cases do you want to specify the
version but not validating it?  Is it simply because there is no
mechanism for tracking the origin/package of the code importing the
other package, and hence we cannot know which DESCRIPTION file to
check against?


I'm not aware of any use case in which the current lack of checking is a 
feature.  I would be interested in a patch (with testing) for this.


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Suggestion: Not having to export .conflicts.OK in name spaces

2010-03-19 Thread Seth Falcon


On 3/17/10 9:11 AM, Henrik Bengtsson wrote:

Currently library() and attach() fail to locate an existing
'.conflicts.OK' in a package wit name space, unless it is exported.
Since there should be little interest in exporting '.conflicts.OK'
otherwise, one may argue that those methods should look for
'.conflicts.OK' even if it is not exported.


I guess I agree that there is no real value in forcing .conflicts.OK to 
be exported.  OTOH, this seems like a dubious feature to begin.  When is 
it a good idea to use it?


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] list_files() memory corruption?

2010-03-17 Thread Seth Falcon

On 3/17/10 7:16 AM, Alistair Gee wrote:
 Yes. I had noticed that R occasionally segfaults (especially when I
 run many concurrent R processes), so I used valgrind to log every use
 of R. In the valgrind logs, I tracked the problem to list_files().
 
 I attached a patch to platform.c (for trunk). Unfortunately, I am
 having trouble building R from the subversion trunk--it is taking a
 very long time decompressing/installing the recommended packages--so I
 haven't been able to verify the fix yet. But my version of platform.c
 does compile, and it does simplify the code b/c count_files() is no
 longer needed.

Hmm, I see that you grow the vector containing filenames by calling
lengthgets and doubling the length.  I don't see where you cleanup
before returning -- seems likely you will end up returning a vector that
is too long.

And there are some performance characteristics to consider in terms of
both run time and memory profile.  Does making a single pass through the
files make up for the allocations/data copying that result from
lengthgets?  Is it worth possibly requiring twice the memory for the
worst case?

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Segfault Problem c++ R interface (detailed)

2010-03-15 Thread Seth Falcon


Hi,

First thing to observe is that you are calling RSymbReg via .Call, but 
that function does not return SEXP as is required by the .Call interface.


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] list_files() memory corruption?

2010-03-15 Thread Seth Falcon

Hi Alistair,

On 3/12/10 4:37 PM, Alistair Gee wrote:
 I am using R-2-10 from subversion.
 
 In the implementation of do_listfiles() in platform.c, it appears to
 allocate a vector of length count where count is calculated by
 count_files(). It then proceeds to call list_files(), passing in the
 vector but not the value of count. Yet list_files() doesn't seem to
 check the length of the vector that was allocated.
 
 What happens if a new file was added to the file system between the
 call to count_files() and list_files()? Doesn't this write past the
 length of the allocated vector?

Good catch.  I've added a length check to prevent a problem.

Cheers,

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] list_files() memory corruption?

2010-03-15 Thread Seth Falcon

On 3/15/10 8:37 PM, Alistair Gee wrote:
 I think I have a fix that avoids the problem by just growing the
 vector as necessary as the directory is traversed (and no longer uses
 count_lines()).
 
 I don't have access to the code at the moment, but I should be able to
 post the patch tomorrow. Is there interest in my patch?

I'm curious to know if this is a problem you have encountered while using R.

My initial thought is that there isn't much benefit of making this part
of the code smarter.  If your patch simplifies things, I'd be more
interested.

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [PATCH] R ignores PATH_MAX and fails in long directories (PR#14228)

2010-03-11 Thread Seth Falcon


On 3/11/10 12:45 AM, Henrik Bengtsson wrote:

Thanks for the troubleshooting,

I just want to second this patch; it would be great if PATH_MAX could
be used everywhere.


The patch, or at least something quite similar, was applied in r51229.

+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] shash in unique.c

2010-03-05 Thread Seth Falcon


On 3/5/10 4:40 AM, Matthew Dowle wrote:

Thanks a lot.  Quick and brief responses below...

Duncan Murdochmurd...@stats.uwo.ca  wrote in message
news:4b90f134.6070...@stats.uwo.ca...

Matthew Dowle wrote:

I was hoping for a 'yes', 'no', 'maybe' or 'bad idea because ...'. No
response resulted in a retry() after a Sys.sleep(10 days).

If its a yes or maybe then I could proceed to try it, test it, and
present the test results and timings to you along with the patch.  It
would be on 32bit Ubuntu first, and I would need to either buy, rent time
on, or borrow a 64bit machine to be able to then test there, owing to the
nature of the suggestion.

If its no, bad idea because... or we were already working on it, or
better,  then I won't spend any more time on it.

Matthew


Matthew Dowlemdo...@mdowle.plus.com  wrote in message
news:hlu4qh$l7...@dough.gmane.org...


Looking at shash in unique.c, from R-2.10.1  I'm wondering if it makes
sense to hash the pointer itself rather than the string it points to?
In other words could the SEXP pointer be cast to unsigned int and the
usual scatter be called on that as if it were integer?


Two negative but probably not fatal issues:

Pointers and ints are not always the same size.  In Win64, ints are 32
bits, pointers are 64 bits.  (Can we be sure there is some integer type
the same size as a pointer?  I don't know, ask a C expert.)

No we can't be sure. But we could test at runtime, and if the assumption
wasn't true, then revert to the existing method.


I think the idea is, on the whole, a reasonable one and would be 
inclined to apply a patch if it demonstrated some measurable performance 
improvement.


For the 32bit v 64bit issue, I think we could detect and in the 64bit 
case take something like:


   ((int)p) ^ ((int)(p  32))



We might want to save the hash to disk.  On restore, the pointer based
hash would be all wrong.  (I don't know if we actually do ever save a hash
to disk.  )

The hash table in unique.c appears to be a temporary private hash, different
to the global R_StringHash. Its private hash appears to be used only while
the call to unique runs, then free'd. Thats my understanding anyway. The
suggestion is not to alter the global R_StringHash in any way at all,  which
is the one that might be saved to disk now or in the future.



I agree with your reading: this is a temporary hash table and there 
would be little reason to want to save it (it is not saved now).


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rubbish values written with zero-length vectors (PR#14217)

2010-02-20 Thread Seth Falcon

On 2/20/10 7:50 AM, Peter Dalgaard wrote:
 You don't want to understand, believe me! ;-)
 
 It's a bug, probably not the very worst kind, but accessing memory that
 isn't yours is potentially harmful (but writing to it is considerably
 worse).
 
 Looks like the issue only concerns the right hand side; nothing to do
 with the auto-expansion of v. I also get
 
 v - integer(0)
 u - integer(1)
 u[[2]] -v
 u
 [1] 0 142000760
 u[[1]] -v
 u
 [1] 142000760 142000760
 a - 1
 a[[1]] -v
 a
 [1] 142000760

I'm thinking this should be an error.  Similar to:

 v = 1
 v[[1]] = integer(3)
Error in v[[1]] = integer(3) :
  more elements supplied than there are to replace

But instead not enough elements supplied.  Perhaps:

 v[[1]] = integer()
Error in v[[1]] = integer() : [[ ]] replacement has zero length

The code in do_subassign2_dflt currently does not check that the
replacement has length  0 for the nsubs == 1 case.  I think we want:


@@ -1529,6 +1532,8 @@ do_subassign2_dflt(SEXP call, SEXP op, SEXP args,
SEXP rho)
if (nsubs == 0 || CAR(subs) == R_MissingArg)
error(_([[ ]] with missing subscript));
if (nsubs == 1) {
+if (length(y) == 0)
+error(_([[ ]] replacement has zero length));
offset = OneIndex(x, thesub, length(x), 0, newname,
recursed ? len-1 : -1, R_NilValue);
if (isVectorList(x)  isNull(y)) {
x = DeleteOneVectorListItem(x, offset);


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R_LIBS_USER bugs

2010-02-17 Thread Seth Falcon

Hi,

On 2/16/10 10:31 AM, Jens Elkner wrote:
 Having currently a big problem with R 2.10.1 vanilla (Solaris):
 
 As soon as the R_LIBS_USER env var gets bigger than 1023 chars R
 completely ignores it and uses the default:

I guess the first question is, why do need such a long list of library
directories?

 Sys.getenv('R_LIBS_USER');
   R_LIBS_USER 
 ${R_LIBS_USER-~/R/i386-pc-solaris2.11-library/2.10} 

I see the same thing with R-devel on OS X.  I can set R_LIBS_USER from
within R using Sys.setenv to a value longer than 1024 and retrieve it
again.  But if I have such a value in my shell, it gets overwritten.

I'm not yet sure what is going on.

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Unexpected behaviour of x[i] when i is a matrix, on Windows

2010-02-12 Thread Seth Falcon


On 2/12/10 10:12 AM, Peter Ehlers wrote:

You're comparing 2.10.0 on Windows with 2.11.0 on Linux.
Have you tried 2.11.0 on Windows? = same result as on Linux.


Indeed, this is new functionality added to R-devel (5 Jan).  Indexing an 
n-dim array with an n-column matrix used to only be supported when the 
matrix contained integers.  Character matrices are now supported to map 
to dimnames of the array.  Here's the NEWS entry:


o   n-dimensional arrays with dimension names can now be indexed
by an n-column character matrix. The indices are matched
against the dimension names.  NA indices are propagated to the
result.  Unmatched values and  are not allowed and result in
an error.

Cheers,

+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Compiling R projects with multiple external libraries

2010-02-11 Thread Seth Falcon


On 2/11/10 9:43 AM, rt wrote:

Hi,

I have just learned how to use compile and link libraries using
make and how to create R projects using R CMD build or INSTALL.  My
understanding of both is somewhat limited and hence the question.

I have a main library written in c which depends on other external
libraries. Main library is to be called from R using .Call. The goal
is to create a single R project that will compile all the external
libraries, the main library, R-C wrappers and install it. I am unsure
about the proper structure of R project directories and the general
workflow such that: (a) external libraries and the main libraries are
built first using make that I already have (b) R-C Wrapper is
compiled and installed using R CMD install.

I understand that there are issues using Makefiles and that there
are preferred ways of doing these things. I am not sure how to use
Makevars instead of Makefile for this purpose. Any help and in
particular pointers to examples of R packages with multiple external
libraries would be appreciated.


1.2.1 Using Makevars in WRE (R-ext manual)  has some detail on this 
and suggests looking at fastICA for an example.


Quote from manual:


If you want to create and then link to a library, say using code in a
subdirectory, use something like

.PHONY: all mylibs

all: $(SHLIB) $(SHLIB): mylibs

mylibs: (cd subdir; make)




+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] src/main/platform.c (PR#14198)

2010-01-28 Thread Seth Falcon


On 1/28/10 3:50 AM, a.r.runna...@kent.ac.uk wrote:

At line 312 in src/main/platform.c (at the latest svn revision, 51039):

 if (length(tl)= 1 || !isNull(STRING_ELT(tl, 0)))

should not '||' read ''?  Likewise four lines later.


Thanks, I'll fix this up.

+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] calling setGeneric() twice

2010-01-19 Thread Seth Falcon


On 1/19/10 10:01 AM, Ross Boylan wrote:

Is it safe to call setGeneric twice, assuming some setMethod's for the
target function occur in between?  By safe I mean that all the
setMethod's remain in effect, and the 2nd call is, effectively, a no-op.

?setGeneric says nothing explicit about this behavior that I can see.
It does say that if there is an existing implicity generic function it
will be (re?)used. I also tried ?Methods, google and the mailing list
archives.

I looked at the code for setGeneric, but I'm not confident how it
behaves.  It doesn't seem to do a simple return of the existing value if
a generic already exists, although it does have special handling for
that case.  The other problem with looking at the code--or running
tests--is that they only show the current behavior, which might change
later.

This came up because of some issues with the sequencing of code in my
package.  Adding duplicate setGeneric's seems like the smallest, and
therefore safest, change if the duplication is not a problem.


I'm not sure of the answer to your question, but I think it is the wrong 
question :-)


Perhaps you can provide more detail on why you are using multiple calls 
to setGeneric.  That seems like a very odd thing to do.


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] calling setGeneric() twice

2010-01-19 Thread Seth Falcon


On 1/19/10 11:19 AM, Ross Boylan wrote:

If files that were read in later in the sequence extended an existing
generic, I omitted the setGeneric().

I had to resequence the order in which the files were read to avoid some
undefined slot classes warnings.  The resequencing created other
problems, including some cases in which I had a setMethod without a
previous setGeneric.

I have seen the advice to sequence the files so that class definitions,
then generic definitions, and finally function and method definitions
occur.  I am trying not to do that for two reasons.  First, I'm trying
to keep the changes I make small to avoid introducing errors.  Second, I
prefer to keep all the code related to a single class in a single file.


If at first you do not get the advice you want, ask again! :-)

Perhaps you could do something like:

  if (!isGeneric(blah)) { setGeneric(blah, ...) }

I would expect setGeneric to create a new generic function and nuke/mask 
methods associated with the generic that it replaces.



Some of the files were intended for free-standing use, and so it would
be useful if they could retain setGeneric()'s even if I also need an
earlier setGeneric to make the whole package work.

I am also working on a python script to extract all the generic function
defintions (that is, setGeneric()), just in case.


Perhaps another option is to group all of the generics together into a 
package and reuse that?  Unless you are using valueClass, I don't think 
you will need any class definitions.


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] optional package dependency

2010-01-16 Thread Seth Falcon

On 1/15/10 7:51 AM, Uwe Ligges wrote:
 the Windows checks for CRAN run with that setting, i.e.
 
  _R_CHECK_FORCE_SUGGESTS_=false
 
 Hence the multicore issue mentioned below actually does not exist.

I did not know that the Windows checks for CRAN used this setting.
My concern was initiated by a Bioconductor package developer wanting to
use multicore and I mistakenly thought the issue would exist for CRAN as
well.  Bioconductor currently uses the default configuration for check
on all platforms.  For the CRAN case, there is no immediate problem.

While there isn't an issue at hand, the approach still seems lacking.
What happens when there is a Windows only package that folks want to
optionally use?  Perhaps public repositories should then not force
suggests for any platforms (do they already?) -- I think that is a
reasonable and simple solution.  But in that case, perhaps the deafult
value should change.

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] optional package dependency

2010-01-16 Thread Seth Falcon

On 1/15/10 7:47 AM, Simon Urbanek wrote:
 
 On Jan 15, 2010, at 10:22 , Seth Falcon wrote:
 I believe another option is:

   pkg - somePkg
   pkgAvail - require(pkg, character.only = TRUE)
   if (pkgAvail)
  ...
   else
  ...

 
 That is not an option - that is the code you usually use with Suggests:
 (except for the pkg assignment which is there I presume to obscure things).

Unfortunately, it _is_ an option, just not a good one :-)

Some packages need to dynamically load other packages (think data
packages) and they will not know ahead of time what packages they will
load.  So there has to be some sort of loop-hole in the check logic.  In
legitimate cases, this is not obscuring anything.  In this case, I think
we agree the use would not be legitimate.

I'm less and less convinced that the force suggests behavior is useful
to anyone.  Package repositories can easily attempt to install all
suggests and so packages will get complete testing.  Package authors
should be responsible enough to test their codes with and without
optional features.  The slight convenience for an author to know that
optional packages are missing is at least equally balanced with the
slight inconvenience of having to change the check configuration in
order to test in the case of missing suggests.

Anyway...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] optional package dependency

2010-01-15 Thread Seth Falcon

On 1/15/10 12:19 AM, Kurt Hornik wrote:
 Jeff Ryan writes:
 
 Hi Ross,
 The quantmod package makes available routines from a variety of
 contributed packages, but gets around your issues with a bit of, um,
 trickery.
 
 Take a look here (unless your name is Kurt ;-) ):

I believe another option is:

   pkg - somePkg
   pkgAvail - require(pkg, character.only = TRUE)
   if (pkgAvail)
  ...
   else
  ...


 But Kurt will we happy to tell you that you can turn off forcing
 suggested packages for checking by setting
 
   _R_CHECK_FORCE_SUGGESTS_=false
 
 in your environment.  The idea is that maintainers typically want to
 fully check their functionality, suggesting to force suggests by
 default.

Unless the public repositories such as CRAN and Bioconductor decide to
set this option, it provides no solution for anyone who maintains or
plans to make available a package through a public R repository such as
CRAN or Bioconductor.

There is a real need (of some kind) here.  Not all packages work on all
platforms.  For example, the multicore package provides a mechanism for
running parallel computations on a multi-cpu box, but it is not
available on Windows.  A package that _is_ available on all platforms
should be able to optionally make use of multicore on non-Windows.  I
don't think there is a way to do that now and pass check without
resorting to tricks as above.  These tricks are bad as they make it
harder to programmatically determine the true suggests.

And NAMESPACE brings up another issue in that being able to do
conditional imports would be very useful for these cases, otherwise you
simply can't make proper use of name spaces for any optional functionality.

I'm willing to help work on and test a solution if we can arrive at some
consensus as to what the solution looks like.

Best,

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How x[, 'colname1'] is implemented?

2010-01-01 Thread Seth Falcon

On 1/1/10 1:40 PM, Peng Yu wrote:
 On Fri, Jan 1, 2010 at 6:52 AM, Barry Rowlingson
 b.rowling...@lancaster.ac.uk wrote:
 On Thu, Dec 31, 2009 at 11:27 PM, Peng Yu pengyu...@gmail.com wrote:
 I don't see where describes the implementation of '[]'.

 For example, if x is a matrix or a data.frame, how the lookup of
 'colname1' is x[, 'colname1'] executed. Does R perform a lookup in the
 a hash of the colnames? Is the reference O(1) or O(n), where n is the
 second dim of x?

  Where have you looked? I doubt this kind of implementation detail is
 in the .Rd documentation since a regular user doesn't care for it.
 
 I'm not complaining that it is not documented.
 
  As Obi-wan Kenobi may have said in Star Wars: Use the source, Luke!:

  Line 450 of subscript.c of the source code of R 2.10 is the
 stringSubscript function. It has this comment:

 /* The original code (pre 2.0.0) used a ns x nx loop that was too
  * slow.  So now we hash.  Hashing is expensive on memory (up to 32nx
  * bytes) so it is only worth doing if ns * nx is large.  If nx is
  * large, then it will be too slow unless ns is very small.
  */
 
 Could you explain what ns and nx represent?

integers :-)

Consider a 5x5 matrix m and a call like m[ , c(C, D)], then
in the call to stringSubscript:

  s - The character vector of subscripts, here c(C, D)

  ns - length of s, here 2

  nx - length of the dimension being subscripted, here 5

  names - the dimnames being subscripted.  Here, perhaps
  c(A, B, C, D, E)

 The definition of large and small here appears to be such that:

 457: Rboolean usehashing = in  ( ((ns  1000  nx) || (nx  1000 
 ns)) || (ns * nx  15*nx + ns) );

The 'in' argument is always TRUE AFAICS so this boils down to:

Use hashing for x[i] if either length(x)  1000 or length(i)  1000 (and
we aren't in the trivial case where either length(x) == 0 or length(i) == 0)

OR use hashing if (ns * nx  15*nx + ns)


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Error in namespaceExport(ns, exports) :

2009-12-05 Thread Seth Falcon


On 12/3/09 3:10 PM, David Scherrer wrote:

Dear all,

I get the error

Error in namespaceExport(ns, exports) :
   undefined exports function1 , function2

when compiling or even when I roxygen my package. The two function I once
had in my package but I deleted them including their .Rd files. I also can't
find them in any other function or help file.

So does anybody know where these functions are still listed that causes this
error?



Are you sure they are not in your NAMESPACE file?

--
Seth Falcon
Program in Computational Biology | Fred Hutchinson Cancer Research Center

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to generate dependency file that can be used by gnu make?

2009-11-17 Thread Seth Falcon


On 11/17/09 5:02 AM, Peng Yu wrote:

This may not easy to do, when the filename are not hard coded strings.
For example, the variable 'filename' is a vector of strings.

for (i in 1:length(filename)){
do something...
save(,file=filename[i])
}



That's right.  I don't think there is a feasible general solution.  You 
might have more success with a convention-based approach for your 
scripts that would allow a simple parser to identify output files by 
name convention, for example.


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] error checks

2009-11-13 Thread Seth Falcon


On 11/13/09 8:02 AM, Tony Plate wrote:

Putting options(error=function() NULL) at the start of the .R will let R
CMD check continue with commands in a file after stop() is called. (Or
anything other than the default options(error=NULL)).


But that's a rather heavy handed approach and could easily mask errors 
that you are not expecting.


Instead, how about using tryCatch so that you limit the errors that you 
trap and also can verify that an error was indeed trapped.  Perhaps 
something like this:



f - function(x) if (x) stop(crash!) else NULL

res - tryCatch(
 {
 f(TRUE) # this will raise an error
 FALSE   # only get here if no error
 },
 error = function(e) TRUE)

## verify we saw an error
stopifnot(res)

+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/users

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] typo in docs for unlink()

2009-11-11 Thread Seth Falcon


On 11/11/09 2:36 AM, Duncan Murdoch wrote:

On 10/11/2009 11:16 PM, Tony Plate wrote:

PS, I should have said that I'm reading the docs for unlink in
R-2.10.0 on a Linux system. The docs that appear in a Windows
installation of R are different (the Windows docs do not mention that
not all systems support recursive=TRUE).

Here's a plea for docs to be uniform across all systems! Trying to
write R code that works on all systems is much harder when the docs
are different across systems, and you might only see system specific
notes on a different system than the one you're working on.


That's a good point, but in favour of the current practice, it is very
irritating when searches take you to functions that don't work on your
system.

One thing that might be possible is to render all versions of the help
on all systems, but with some sort of indicator (e.g. a colour change)
to indicate things that don't apply on your system, or only apply on
your system. I think the hardest part of doing this would be designing
the output; actually implementing it would not be so bad.


I would be strongly in favor of a change that provided documentation for 
all systems on all systems.


Since platform specific behavior for R functions is the exception rather 
than the norm, I would imagine that simply displaying doc sections by 
platform would be sufficient.


I think the benefit of being able to see what might not work on another 
platform far out weighs the inconvenience of finding doc during a search 
for something that only works on another platform -- hey, that still 
might be useful as it would tell you what platform you should use ;-)


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] memory misuse in subscript code when rep() is called in odd way

2009-11-03 Thread Seth Falcon


Hi,

On 11/3/09 2:28 PM, William Dunlap wrote:

The following odd call to rep()
gives somewhat random results:


rep(1:4, 1:8, each=2)


I've committed a fix for this to R-devel.

I admit that I had to reread the rep man page as I first thought this 
was not a valid call to rep since times (1:8) is longer than x (1:4), 
but closer reading of the man page says:


   If times is a vector of the same length as x (after replication
   by each), the result consists of x[1] repeated times[1] times,
   x[2] repeated times[2] times and so on.

So the expected result is the same as rep(rep(1:4, each=2), 1:8).


valgrind says that the C code is using uninitialized data:

rep(1:4, 1:8, each=2)

==26459== Conditional jump or move depends on uninitialised value(s)
==26459==at 0x80C557D: integerSubscript (subscript.c:408)
==26459==by 0x80C5EDC: Rf_vectorSubscript (subscript.c:658)


A little investigation seems to suggest that the problem is originating 
earlier.  Debugging in seq.c:do_rep I see the following:


 rep(1:4, 1:8, each=2)

Breakpoint 1, do_rep (call=0x102de0068, op=value temporarily 
unavailable, due to optimizations, args=value temporarily unavailable, 
due to optimizations, rho=0x1018829f0) at 
/Users/seth/src/R-devel-all/src/main/seq.c:434
434 ans = do_subset_dflt(R_NilValue, R_NilValue, list2(x, ind), 
rho);

(gdb) p Rf_PrintValue(ind)
 [1]  1  1  1  2  2  2
 [7]  2  2  2  2  3  3
[13]  3  3  3  3  3  3
[19]  3  3  3  4  4  4
[25]  4  4  4  4  4  4
[31]  4  4  4  4  4  4
[37]   44129344  1   44129560  1   44129776  1
[43]   44129992  1   44099592  1   44099808  1
[49]   44100024  1   44100456  127241443801089
[55] -536870733  0   54857992  1   22275728  1
[61]2724144  1 34  1   44100744  1
[67]   44100960  1   44101176  1   43652616  1
$2 = void
(gdb) c
Continuing.
Error: only 0's may be mixed with negative subscripts

The patch I applied adjusts how the index vector length is computed when 
times has length more than one.


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] parse_Rd and/or lazyload problem

2009-11-03 Thread Seth Falcon


Hi,

On 11/3/09 6:51 PM, mark.braving...@csiro.au wrote:


file.copy( 'd:/temp/Rdiff.Rd', 'd:/temp/scrunge.Rd') # Rdiff.Rd from 'tools' 
package source

eglist- list( scrunge=parse_Rd(  'd:/temp/scrunge.Rd'))
tools:::makeLazyLoadDB( eglist, 'd:/temp/ll')
e- new.env()
lazyLoad( 'd:/temp/ll', e)
as.list( e) # force; OK

eglist1- list( scrunge=parse_Rd(  'd:/temp/Rdiff.Rd'))
tools:::makeLazyLoadDB( eglist1, 'd:/temp/ll')
e- new.env()
lazyLoad( 'd:/temp/ll', e)
as.list( e) # Splat

It doesn't make any difference which file I process first; the error comes the 
second time round.


If I adjust this example in terms of paths and run on OS X, I get the 
following error on the second run:


 as.list(e) # Splat
Error in as.list.environment(e) : internal error -3 in R_decompress1

I haven't looked further yet.

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Help with lang4

2009-10-29 Thread Seth Falcon


On 10/29/09 7:00 AM, Abhijit Bera wrote:

Hi

I seem to have run into a situation where I have more than 3 arguments to
pass to a function from C.

the following functions help me build an expression for evaluation:

lang
lang2
lang3
lang4

What should one do if there are more arguments than lang4 can handle?


If you take a look at the source code for those functions, something may 
suggest itself.  R function calls at the C level are composed like in 
lisp: a pair-list starting with the function cons'ed with the args.


+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Help with lang4

2009-10-29 Thread Seth Falcon


On 10/29/09 7:38 AM, Abhijit Bera wrote:

Can't find the source to Rf_lang* series of functions. :|

But I'm thinking it should be like this correct me if I'm wrong:

PROTECT(e=lang4(install(myfunction),arg1,arg2,arg3);
PROTECT(SETCAR(CDR(e),portConstraints));
PROTECT(portVal=R_tryEval(e,R_GlobalEnv, NULL));


Perhaps I'm misunderstanding your goal, but I do not think this is correct.

After this call:

 PROTECT(e=lang4(install(myfunction),arg1,arg2,arg3);

e can be visualized as:

   (myfunction (arg1 (arg2 (arg3 nil

If you want to end up with:

   (myfunction (arg1 (arg2 (arg3 (arg4 nil)

Then you either will want to build up the pair list from scratch or you 
could use some of the helpers, e.g. (all untested),


   SEXP last = lastElt(e);
   SEXP arg4Elt = lang1(arg4);
   SETCDR(last, arg4Elt);

Reading Rinlinedfuns.h should help some.

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] (PR#14012)

2009-10-17 Thread Seth Falcon

* On 2009-10-16 at 15:00 +0200 sj...@damtp.cam.ac.uk wrote:
 I think Rscript has a problem running files that have mac encodings
 for newline (^M rather than ^J on linux).  If I source the file within
 R, it works okay:

  source('j.R')
 [1] MEA_data/sernagor_new/CRX_P7_1.txt
 
 But if I run the file using Rscript on a linux box I get a strange
 error message: 
 
 $ Rscript --vanilla j.R
 
 Execution halted

I think you are right that Rscript is unhappy to handle files with CR
line terminators.  But IIUC, the purpose of Rscript is to enable R
script execution on unix-like systems like:

   #!/path/to/Rscript --vanilla
   print(1:10)

So then I'm not sure how useful it is for Rscript to handle such
files.  Why not convert to a more common and portable line termination
for your R script files?


+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] how to document stuff most users don't want to see

2009-10-06 Thread Seth Falcon

Writing good documentation is hard.  I can appreciate the desire to
find technological solutions that improve documentation.  However, the
benefit of a help system that allows for varying degrees of verbosity
is very likely to be overshadowed by the additional complexity imposed
on the help system.

Users would need to learn how to tune the help system.  Developers
would need to learn and follow the system of variable verbosity.  This
time would be better spent by developers simply improving the
documentation and by users by simply reading the improved
documentation.

My $0.02.

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] unit testing for R packages?

2009-10-05 Thread Seth Falcon

Hi,

On Mon, Oct 5, 2009 at 12:01 PM, Blair Christian
blair.christ...@gmail.com wrote:
 I'm interested in putting some unit tests into an R package I'm
 building.  I have seen assorted things such as Runit library, svUnit
 library, packages
 with 'tests' directories, etc

 I grep'd unit test through the writing R extensions manual but didn't find
 anything.  Are there any suggestions out there?  Currently I have
 several (a lot?) classes/methods that I keep tinkering with, and I'd
 like to run a script frequently to check that I don't cause any
 unforeseen problems.

I've had good experiences using RUnit.  To date, I've mostly used
RUnit by putting tests in inst/unitTests and creating a Makefile there
to run the tests.  You should also be able to use RUnit in a more
interactive fashion inside an interactive R session in which you are
doing development.

The vignette in svUnit has an interesting approach for integrating
unit testing into R CMD check via examples in an Rd file within the
package.

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] build time dependency

2009-09-28 Thread Seth Falcon

On Mon, Sep 28, 2009 at 11:25 AM, Romain Francois
romain.franc...@dbmail.com wrote:
 Hi Uwe,

 I think you are supposed to do this kind of sequence:

 R CMD roxygen yourRoxygenablePackage
 R CMD build yourRoxygenablePackage_roxygen

 ... but I don't like this because what you upload to cran is not the actual
 source but somethingalready pre-processed. (This also applies to packages
 shipping java code, most people just compile the java code on their machine
 and only supply a jar of compiled code, but that's another story I suppose
 ...)

 I'd prefer the roxygenation to be part of the standard build/INSTALL system,
 so my plan is to write configure and configure.win which would call
 roxygenize to generate Rd.

I can appreciate the desire to make the true sources available.  At
the same time, I think one should very carefully consider the expense
of external dependencies on a package.

One could view doc generation along the same lines as configure script
generation -- a compilation step that can be done once instead of by
all those who install and as a result reduce the depencency burden of
those wanting to install the package.  Configure scripts are almost
universally included pre-built in distribution source packages so that
users do not need to have the right version of autoconf/automake.

In other words, are you sure you want to require folks to install
roxygen (or whatever) in order to install your package? Making it easy
to do so is great, but in general if you can find a way to reduce
dependencies and have your package work, that is better. :-)

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] crash with NAs in subscripted assignment of a raw vector

2009-09-24 Thread Seth Falcon

2009/9/24 Hervé Pagès hpa...@fhcrc.org:
   x - charToRaw(ABCDEFGx)
   x[c(1:3, NA, 6)] - x[8]

   *** caught segfault ***
  address 0x8402423f, cause 'memory not mapped'

Thanks for the report.  I have a fix which I will commit after some testing.

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rcmdr package dependencies

2009-09-22 Thread Seth Falcon

* On 2009-09-22 at 20:16 +0200 Uwe Ligges wrote:
 no, this is not possible.
 
 Consider your package A (or Rcmdr) suggests B that suggests C.
 Then A::foo uses the function B::bar which only works if C::dep is
 present. B works essentially without C but it requires C just to
 make bar work. Then this means your A::foo won't work if C is not
 installed and you won't get it with the setup mentioned above.
 
 In summary, I fear what you want might work well *now* (by chance),
 but it does not work in general.

In general, one would expect a given package to function when its
suggested packages are not available.  As such, it seems quite
reasonable to install a package, its Depends, Imports, and Suggests,
but not install Suggests recursively.

I think you could achieve such an installation using two calls to
install.packages:

install.packages(Rcmdr)
Rcmdr.Suggests - strsplit(packageDescription(Rcmdr)$Suggests, ,\\s?)[[1]]
## need extra cleanup since packageDescription(blah)$Suggests
## Returns package names with versions as strings
wantPkgs - sub(^([^ ]+).*, \\1, Rcmdr.Suggests)
havePkgs - installed.packages()[, Package]
wantPkgs - wantPkgs[!(wantPkgs %in% havePkgs)]
install.packages(wantPkgs)

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] asking for suggestions: interface for a C++ class

2009-09-04 Thread Seth Falcon

* On 2009-09-04 at 22:54 +0200 Yurii Aulchenko wrote:
 We are at an early stage of designing an R library, which is effectively an 
 interface to a C++ library providing fast access to large matrices stored 
 on HDD as binary files. The core of the C++ library is relatively 
 sophisticated class, which we try to mirror using an S4 class in R. 
 Basically when a new object of that class is initiated, the C++ constructor 
 is called and essential elements of the new object are reflected as slots 
 of the R object.

Have a look at external pointers as described in the Writing R
Extensions Manual.

 Now as you can imagine the problem is that if the R object is removed using 
 say rm command, and not our specifically designed one, the C++ object 
 still hangs around in RAM until R session is terminated. This is not nice, 
 and also may be a problem, as the C++ object may allocate large part of 
 RAM. We can of cause replace generic rm and delete functions, but this 
 is definitely not a nice solution.

You likely want a less literal translation of your C++ object into R's
S4 system.  One slot should be an external pointer which will give you
the ability to define a finalizer to clean up when the R level object
gets gc'd.

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Conditional dependency between packages

2009-07-01 Thread Seth Falcon

Hi Jon,

* On 2009-06-30 at 15:27 +0200 Jon Olav Skoien wrote:
 I work on two packages, pkg1 and pkg2 (in two different projects). pkg1 is 
 quite generic, pkg2 tries to solve a particular problem within same field 
 (geostatistics). Therefore, there might be users who want to use pkg2 as an 
 add-on package to increase the functionality of pkg1. In other words, 
 functions in pkg1 are based on the S3 class system, and I want pkg2 to 
 offer methods for pkg2-objects to functions defined in pkg1, for users 
 having both packages installed. Merging the packages or making pkg2 always 
 depend pkg1 would be the easiest solution, but it is not preferred as most 
 users will only be interested in one of the packages.

I'm not sure I understand the above, I think you may have a pkg2 where
you meant pkg1, but I'm not sure it matters.

I think the short version is, pkg2 can be used on its own but will do
more if pkg1 is available.  I don't think R's packaging system
currently supports conditional dependencies as you might like.
However, I think you can get the behavior you want by following a
recipe like:

* In pkg2 DESCRIPTION, list Suggests: pkg1.

* In pkg2 code, you might define a package-level environment and 
  in .onLoad check to see if pkg1 is available.

 PKG_INFO - new.env(parent=emptyenv())
 .onLoad - function(libname, pkgname) {
 if (check if pkg1 is available) {
PKG_INFO[[pkg1]] - TRUE
 }
 }

* Then your methods can check PKG_INFO[[pkg1]].


 
+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bug in Rf_PrintValue ?

2009-06-28 Thread Seth Falcon

Hi,


* Kynn Jones wrote:
  I'm very green with R, so maybe this is not a bug, but it looks like one to
  me.  The following program segfaults at the second call to
  Rf_PrintValue().

Yes, I think you found a bug.

* On 2009-06-26 at 16:09 -0700 Martin Morgan wrote:
 mkChar creates a CHARSXP. These are not normally user-visible, but
 instead are placed into a STRSXP (vector type of 'character' in R). So
 you want to
 
 PROTECT( x_r = allocVector(STRSXP, 1) );
 SET_STRING_ELT(x_r, 0, mkChar( x ));
 
 (There is also mkString( x ) for the special case of constructing
 character(1)).
 
 I think the segfault is because the CHARSXP returned by mkChar is
 initialized with information different from that expected of
 user-visible SEXPs (I think it is the information on chaining the node
 to the hash table; see Defn.h:120 and memory.c:2844); I think the
 success of Rf_PrintValue on 'foo' is a ghost left over from when
 CHARSXPs were user-visible.

CHARSXPs are not intended to be user-visible.  However, Rf_PrintValue
should not segfault either.  Indeed the root cause was attempting to
print the attributes for the CHARSXP which have been repurposed for
handling the CHARSXP cache.

I have patched R-devel so that PrintValue works as expected on
CHARSXPs.  The original code should now work without crashing.  But
this should really only be used to assist in debugging.  CHARSXPs
should never be exposed at the user level and should instead be
elements of a character vector (STRSXP).

+ seth

--
Seth Falcon
http://userprimary.net/user

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64

2009-01-31 Thread Seth Falcon

Hi Dirk,

* On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote:
 Turns out, as so often, that there was a regular bug lurking which is now
 fixed in RDieHarder 0.1.1.  But I still would like to understand exactly what
 is different so that --slave was able to trigger it when --vanilla,
 --no-save, ... did not.  
 
 [ The library() vs require() issue may have been a red herring. ]

Without telling us any details about the nature of the bug you found,
it is difficult to speculate.  If the bug was in your C code and
memory related, it could simply be that the two different run paths
resulted in different allocation patterns, one of which triggered the
bug.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64

2009-01-31 Thread Seth Falcon

* On 2009-01-31 at 09:34 -0600 Dirk Eddelbuettel wrote:
 | Without telling us any details about the nature of the bug you found,
 | it is difficult to speculate.  If the bug was in your C code and
 | memory related, it could simply be that the two different run paths
 | resulted in different allocation patterns, one of which triggered the
 | bug.
 
 Yes yes and yes :)  It was in C, and it was memory related and it dealt
 getting results out of the library to which the package interfaces. 
 
 But short of looking at the source, is there any documentation on what
 --slave does differently?

The R-intro manual has a brief description:

--slave
Make R run as quietly as possible. This option is intended to
support programs which use R to compute results for them. It
implies --quiet and --no-save.

I suspect that for more detail than that, one would have to look at
the sources.  But the above helps explain the behavior you saw; a
--quite R will suppress some output and that will make a difference
in terms of memory allocation.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R crashes on sprintf with bad format specification (PR#13283)

2008-11-13 Thread Seth Falcon

* On 2008-11-13 at 18:51 -0500 Duncan Murdoch wrote:
 On 12/11/2008 8:30 PM, [EMAIL PROTECTED] wrote:
 Full_Name: Oren Cheyette
 Version: 2.7.2
 OS: Win XP
 Submission from: (NULL) (64.161.123.194)


 Enter the following at the R command prompt:
 sprintf(A %S %S %S XYZ, 1, 1, 1);

 Note the erroneous capitalized %S instead of %s and the numeric inputs 
 instead
 of strings. With strings there's no crash - R reports bad format
 specifications.

 2.7.2 is obsolete, but I can confirm a crash on Windows with a recent 
 R-devel.

Can confirm as well on OSX with a fairly recent R-devel.

(gdb) bt 10
#0  0x9575e299 in _UTF8_wcsnrtombs ()
#1  0x957bb3a0 in wcsrtombs_l ()
#2  0x956ebc1e in __vfprintf ()
#3  0x95711e66 in sprintf ()
#4  0x00492bb8 in do_sprintf (call=0x10cb470, op=0x1018924, args=value 
temporarily unavailable, due to optimizations, env=0x10a40b0) at 
../../../../R-devel-all/src/main/sprintf.c:179
#5  0x003fe1af in do_internal (call=0x10cb4a8, op=0x100fc38, args=0x10a40e8, 
env=0x10a40b0) at ../../../../R-devel-all/src/main/names.c:1140

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 coercion responsibility

2008-09-17 Thread Seth Falcon

A couple more comments...

* On 2008-09-15 at 10:07 -0700 Seth Falcon wrote:
  The example is with RSQLite but the same thing happens with
  RMySQL, and other DBI packages.

The use of as() within the various DBI packages should be
re-evaluated.  I suspect some of that code was among the first to make
heavy use of S4.  As S4 has evolved and become better documented and
understood, the DBI packages may not always have had a chance to keep
up.

   library(RSQLite)  Loading required package: DBI
   m - dbDriver(SQLite)
   con - dbConnect(m)
   setClass(SQLConPlus, contains=c(SQLiteConnection,integer))
  [1] SQLConPlus
   conPlus - new(SQLConPlus, con, 1)
   dbListTables(con)
  character(0)
   dbListTables(conPlus)

In the latest R-devel code (svn r46542), this behaves differently (and
works as you were hoping).  I get:

library(RSQLite)
setClass(SQLConPlus, contains=c(SQLiteConnection,integer))
dd = data.frame(a=1:3, b=letters[1:3])
con = new(SQLConPlus, dbConnect(SQLite(), :memory:), 11L)
dbWriteTable(con, t1, dd)
dbListTables(con)
dbDisconnect(con)

I know that the methods package has been undergoing some improvements
recently, so it is not entirely surprising that behavior has changed.

I think the new behavior is desirable as it follows the rule that the
order of the superclasses listed in contains is used to break ties
when multiple methods match.  Here, there are two coerce() methods
(invoked via as()) one for SQLiteConnection and one, I believe
auto-generated, for integer.  Since SQLiteConnection comes first, it
is chosen.  Indeed, if you try the following, you get the error you
were originally seeing:

setClass(SQLConMinus, contains=c(integer, SQLiteConnection))
con2 = new(SQLConMinus, dbConnect(SQLite(), :memory:), 11L)

 as(con, integer)
[1] 15395 2
 as(con2, integer)
[1] 11
 

 Why not extend SQLiteConnection and add extra slots as you like.
 The dispatch will in this case be much easier to reason about.

This is still appropriate advice.  In general, inheritance should be
used with care, and multiple inheritance should be used with multiple
care.  Using representation() to add additional slots likely makes
more sense here.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

 sessionInfo()
R version 2.8.0 Under development (unstable) (--) 
i386-apple-darwin9.4.0 

locale:
C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base 

other attached packages:
[1] RSQLite_0.7-0 DBI_0.2-4

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 coercion responsibility

2008-09-17 Thread Seth Falcon

* On 2008-09-17 at 19:25 -0700 Seth Falcon wrote:
 In the latest R-devel code (svn r46542), this behaves differently (and
 works as you were hoping).  I get:
 
 library(RSQLite)
 setClass(SQLConPlus, contains=c(SQLiteConnection,integer))
 dd = data.frame(a=1:3, b=letters[1:3])
 con = new(SQLConPlus, dbConnect(SQLite(), :memory:), 11L)
 dbWriteTable(con, t1, dd)
 dbListTables(con)
 dbDisconnect(con)

*argh* I'm certain this was working for me and yet when I try to
reproduce in a new R shell it errors out.  The dispatch is not as I
wrote.

as(con, integer)
[1] 11

That is, the auto-generated coerce method to integer is selected in
preference to the coerce method for SQLiteConnection.

 I think the new behavior is desirable as it follows the rule that the
 order of the superclasses listed in contains is used to break ties
 when multiple methods match.  Here, there are two coerce() methods
 (invoked via as()) one for SQLiteConnection and one, I believe
 auto-generated, for integer.  Since SQLiteConnection comes first, it
 is chosen.  Indeed, if you try the following, you get the error you
 were originally seeing:
 
 setClass(SQLConMinus, contains=c(integer, SQLiteConnection))
 con2 = new(SQLConMinus, dbConnect(SQLite(), :memory:), 11L)
 
  as(con, integer)
 [1] 15395 2
  as(con2, integer)
 [1] 11
  

I'm still baffled how this was working for me and now is not.
Nevertheless, I think it is how things *should* work and will do some
further investigation about what's going on.

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 coercion responsibility

2008-09-17 Thread Seth Falcon

Continuing to talk to myself here...

* On 2008-09-17 at 21:06 -0700 Seth Falcon wrote:
 *argh* I'm certain this was working for me and yet when I try to
 reproduce in a new R shell it errors out.

This looks like an infelicity in the methods caching.

To make it work:

library(RSQLite)
setClass(SQLConPlus, contains=c(SQLiteConnection,integer))
dd = data.frame(a=1:3, b=letters[1:3])
con.orig = dbConnect(SQLite(), :memory:)
con = new(SQLConPlus, con.orig, 11L)

## call selectMethod, must have a side-effect on the
## methods cache
selectMethod(coerce, signature=c(SQLConPlus, integer))

dbWriteTable(con, t1, dd)
dbListTables(con)
dbDisconnect(con)

Now I get:

as(con, integer)
[1] 15719 0

Haven't tried the above in an older version of R.

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 coercion responsibility

2008-09-15 Thread Seth Falcon

* On 2008-09-15 at 08:56 -0400 Paul Gilbert wrote:
 Should functions or the user be responsible for coercing an S4 object 
 argument containing the proper object (and thus should below be 
 considered a bug in the packages or not)?

 The example is  with RSQLite but the same thing happens with RMySQL, and 
 other DBI packages.

  library(RSQLite)  Loading required package: DBI
  m - dbDriver(SQLite)
  con - dbConnect(m)
  setClass(SQLConPlus, contains=c(SQLiteConnection,integer))
 [1] SQLConPlus
  conPlus - new(SQLConPlus, con, 1)
  dbListTables(con)
 character(0)
  dbListTables(conPlus)
 Error in sqliteExecStatement(con, statement, bind.data) :
 RS-DBI driver: (invalid dbManager handle)
  dbListTables(as(conPlus, SQLiteConnection))
 character(0)
 

 The problem is happening in sqliteExecStatement which does
   conId - as(con, integer)
 but con only *contains* an SQLiteConnection and the other integer
 causes confusion. If the line were
   conId - as(as(con, SQLiteConnection), integer)
 everything works.

 I can work around this, but I am curious where  responsibility for this 
 coercion should be.

Well, you've created a class that is-a SQLiteConnection *and* is-a
integer.  The fact that the as() method dispatch doesn't match that of
SQLiteConnection should really be that surprising.

I don't see how this could be the responsibility of the author of the
class you've subclassed.

I would also question why SQLConPlus is extending integer.  That seems
like a very strange choice.  Why not extend SQLiteConnection and add
extra slots as you like.  The dispatch will in this case be much
easier to reason about.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] how to install header files in package

2008-06-13 Thread Seth Falcon

* On 2008-06-13 at 08:40 -0500 Dirk Eddelbuettel wrote:
 On 13 June 2008 at 14:28, Kjell Konis wrote:
 | Is there a way to get R CMD INSTALL (and friends) to copy the header  
 | files from a source package's src directory to the include directory?
 
 Only if you (ab-)use the 'make all' target in src/Makefile to copy
 them, as a recent thread on r-devel showed.  Some of us suggested
 that a 'make install' target would be a nice thing to have.

Can you elaborate on the use case?  If the desire is to allow pkgB to
access header files provided by pkgA, then you can use the LinkingTo
field in the DESCRIPTION file as described in Writing R Extensions
in the Registering native routines section.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] RSQLite 0.6-9 uploaded to CRAN [was: RSQLite bug fix for install with icc]

2008-06-10 Thread Seth Falcon

Hi all,

A new version of RSQLite has been uploaded to CRAN and should be
available soon.  This update contains a minor change to the C code
that should improve compatibility on various unix OS.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RSQLite bug fix for install with icc

2008-06-04 Thread Seth Falcon

Hi Mark,

[the r-sig-db list might have been a better spot for this...]

* On 2008-06-04 at 14:28 -0400 Mark Kimpel wrote:
 I encountered problems installing RSQLite, R-2.7.0, on RHEL4 using
 Intel 10.1 icc, My sysadmin helped me track down the problem and
 kindly forwarded me the fix, which corrected the problem.
 
 What follows is from the sysadmin.  Mark
 
 
 I looked at the error, looks like there is a bug in the source code.
 I've attached a new tarball, hopefully fixed.
 
 I added
 #include sys/types.h
 immediately before
 #include unistd.h
 in
 RSQLite/src/RS-DBI.h

I will see about making such a change.  I suspect the correct fix is
one that tweaks configure to determine where things are based on the
current system (the current code is correct for gcc I believe).

Anyhow, thanks for the report.  I will try to have an update within a
week.

+ seth



-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] NAMESPACE methods guidance, please

2008-06-01 Thread Seth Falcon

* On 2008-06-01 at 11:30 -0400 John Chambers wrote:
 My impression (but just as a user, not an implementer) is that the 
 NAMESPACE mechanism is intended to search for anything, not just for 
 methods, as follows:
 
  - look in the namespace itself;
  - look in the imports, which are in the parent.env of the namespace;
  - look in the base package's namespace.

As described in the R News article [1], the above describes the static
component of the search mechanism, but there is a dynamic component
which adds:

- look in .GlobalEnv
- look in each package on the search path
- look (again) in base

[1] http://cran.r-project.org/doc/Rnews/Rnews_2003-1.pdf

 Period.  This provides a definition of the behavior of functions in the 
 package that is independent of the dynamically changing contents of the 
 search list.

I think the dynamic lookup is important.  Consider class Foo and some
methods, like show, for working with Foo instances defined in pkgA.
Further, suppose pkgB imports pkgA and contains a function that
returns a Foo instance.

If a user class library(pkgB) at the prompt, both the developer and
the user would like for methods for dealing with Foo instances to be
available.

This has been achieved by adding pkgA to the Depends field of pkgB.
In this case, library(pkgB) has the side-effect of attaching pkgA to
the search path and Foo instances behave as desired.  This, I believe,
describes the first part of Martin's example:

Martin Morgan:
  library(KEGG.db) # Imports, Depends AnnotationDbi; KEGG.db is data-only
  head(ls(KEGGPATHID2EXTID))
  
  [1] hsa00232 hsa00230 hsa04514 hsa04010 hsa04012 hsa04150
 

John Chambers:
 Depends may cause the relevant packages to be put on the search list. 
 But a subsequent attach or detach could change what objects were found.  
 So unless this is not the intended interpretation of namespaces, looking 
 in the search list seems a bad idea in principle.

I agree that using the dynamic lookup when the static lookup is
available is bad programming practice.  However, given the flexibility
of the current tools, it seems not unreasonable to expect that
picking up a method via the search path would work in a package just
as it does (should?) interactively.


+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 2.7.0, match() and strings containing \0 - bug?

2008-04-28 Thread Seth Falcon

Hi Jon,

* On 2008-04-28 at 11:00 +0100 Jon Clayden wrote:
 A piece of my code that uses readBin() to read a certain file type is
 behaving strangely with R 2.7.0. This seems to be because of a failure
 to match() strings after using rawToChar() when the original was
 terminated with a \0 character. Direct equality testing with ==
 still works as expected. I can reproduce this as follows:
 
  x - foo
  y - c(charToRaw(foo),as.raw(0))
  z - rawToChar(y)
  z==x
 [1] TRUE
  z==foo
 [1] TRUE
  z %in% c(foo,bar)
 [1] FALSE
  z %in% c(foo,bar,foo\0)
 [1] FALSE
 
 But without the nul character it works fine:
 
  zz - rawToChar(charToRaw(foo))
  zz %in% c(foo,bar)
 [1] TRUE
 
 I don't see anything about this in the latest NEWS, but is this
 expected behaviour? Or is it, as I suspect, a bug? This seems to be
 new to R 2.7.0, as I said.

The short answer is that your example works in R-2.6 and in the
current R-devel.  Whether the behavior in R-2.7 is a bug is perhaps in
the eye of the beholder.

Historically, R's internal string representation allowed for embedded
nul characters.  This was particularly useful before the raw vector
type, RAWSXP, was introduced.  Since the vast majority of
R's internal string processing functions use standard C semantics
and truncated at first nul there has always been some room for
interesting behavior.  The change in R-2.7 was an attempt to start
resolving these inconsistencies.  Since then the core team has agreed
to remove the partial support for embedded nul in character strings --
raw can be used when this is desired, and having nul terminated
strings will make the code more consistent and easier to maintain
going forward.

Best Wishes,

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] segfault in gregexpr()

2008-01-31 Thread Seth Falcon

Hi again,

 Herve wrote:
gregexpr(, abc, fixed=TRUE)
 
*** caught segfault ***
   address 0x1c09000, cause 'memory not mapped'

This should be fixed in latest svn.  Thanks for the report.

+ seth

-- 
Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] segfault in gregexpr()

2008-01-30 Thread Seth Falcon

Hi Herve,

Thanks for the report.  I can reproduce this with latest R-devel.
perl=TRUE is also broken.  I have a patch which I am testing.  With
it, I get:

 gregexpr(, abc) 
[[1]]
[1] 1 2 3
attr(,match.length)
[1] 0 0 0

 gregexpr(, abc, fixed=TRUE)
[[1]]
[1] 1 2 3
attr(,match.length)
[1] 0 0 0

 gregexpr(, abc, perl=TRUE)
[[1]]
[1] 1 2 3
attr(,match.length)
[1] 0 0 0


+ seth

-- 
Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] isOpen on closed connections

2007-11-14 Thread Seth Falcon

Roger D. Peng [EMAIL PROTECTED] writes:

 As far as I can tell, 'isOpen' cannot return FALSE in the case when 'rw = 
 '. 
 If the connection has already been closed by 'close' or some other function, 
 then isOpen will produce an error.  The problem is that when isOpen calls 
 'getConnection', the connection cannot be found and 'getConnection' produces 
 an 
 error.  The check to see if it is open is never actually done.

I see this too with R-devel (r43376) {from Nov 6th}.

con = file(example1, w)
isOpen(con)

[1] TRUE

showConnections()

  description class  mode text   isopen   can read can write
3 example1  file w  text opened no yes

close(con)
isOpen(con)

Error in isOpen(con) : invalid connection

## printing also fails
con
Error in summary.connection(x) : invalid connection

 This came up in some code where I'm trying to clean up connections after 
 successfully opening them.  The problem is that if I try to close a 
 connection 
 that has already been closed, I get an error (because 'getConnection' cannot 
 find it).  But then there's no way for me to find out if a connection has 
 already been closed.  Perhaps there's another approach I should be taking?  
 The 
 context is basically,

 con - file(foo, w)

 tryCatch({
   ## Do stuff that might fail
   writeLines(stuff, con)
   close(con)

   file.copy(foo, bar)
 }, finally = {
   close(con)
 })

This doesn't address isOpen, but why do you have the call to close
inside the tryCatch block?  Isn't the idea that finally will always be
run and so you can be reasonably sure that close gets called once?

If your real world code is more complicated, perhaps you can make use
of a work around like:

myIsOpen = function(con) tryCatch(isOpen(con), error=function(e) FALSE)

You could do similar with myClose and close a connection as many
times as you'd like :-)

+ seth

-- 
Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RSQLite indexing

2007-10-22 Thread Seth Falcon

Jeffrey Horner [EMAIL PROTECTED] writes:

 Thomas Lumley wrote on 10/22/2007 04:54 PM:
 I am trying to use RSQLite for storing data and  I need to create indexes on 
 two variables in the table. It appears from searching the web that the 
 CREATE 
 INDEX operation in SQLite is relatively slow for large files, and this has 
 been 
 my experience as well.

What is your schema?  In particular, are things that are integers or
floats being stored that way in SQLite?

I believe the annotation data packages via AnnotationDbi are using
cache_size=64000 and synchronous=0 and that this was determined by a
handful of experiments on typical annotation dbs.

Columns with few levels may not benefit from an index.  See this
thread:

http://thread.gmane.org/gmane.comp.db.sqlite.general/23683/focus=23693

But your column with many levels should suffer this problem :-)

+ seth

-- 
Seth Falcon | [EMAIL PROTECTED] | blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] warning upon automatic close of connection

2007-09-12 Thread Seth Falcon

Gabor Grothendieck [EMAIL PROTECTED] writes:
 I noticed that under R 2.6.0 there is a warning about closing the connection
 in the code from this post:
 https://stat.ethz.ch/pipermail/r-help/2007-September/140601.html

 which is evidently related to the following from the NEWS file:

 o Connections will be closed if there is no R object referring to
   them.  A warning is issued if this is done, either at garbage
   collection or if all the connection slots are in use.

 If we use read.table directly it still happens:

 # use Lines and Lines2 from cited post
 library(zoo)
 DF1 - read.table(textConnection(Lines), header = TRUE)
 DF2 - read.table(textConnection(Lines2), header = TRUE)
 z1 - zoo(as.matrix(DF1[-1]), as.Date(DF1[,1], %d/%m/%Y))
 z2 - zoo(as.matrix(DF2[-1]), as.Date(DF2[,1], %d/%m/%Y))
 both - merge(z1, z2)
 plot(na.approx(both))

 R.version.string # Vista
 [1] R version 2.6.0 alpha (2007-09-06 r42791)

 Is this annoying warning really necessary?  I assume we can get rid of
 it by explicitly naming and closing the connections but surely there should
 be a way to avoid the warning without going to those lengths.

Up until the change you mention above it really was necessary to name
and close all connections.  Short scripts run in fresh R sessions may
not have had problems with code like you have written above, but
longer programs or shorter ones run in a long running R session would
run out of connections.

Now that connections have weak reference semantics, one can ask
whether this behavior should be standard and no warning issued.

 I would have thought that read.table opens the connection then it would
 close it itself so no warning would need to be generated.

In your example, read.table is _not_ opening the connection.  You are
passing an open connection which has no symbol bound to it:

   foo = 
   c = textConnection(foo)
   c
 descriptionclass mode text 
   foo textConnection  r   text 
  opened can readcan write 
openedyes no 

But I think passing a closed connection would cause the same sort of
issue.  It seems that there are two notions of closing a connection:
(i) close as the opposite of open, and (ii) clean up the entire
connection object.  I haven't looked closely at the code here, so I
could be wrong, but I'm basing this guess on the following:

 file(foo)
description   classmodetext  openedcan read 
  foo  file r  textclosed   yes 
  can write 
  yes 
## start new R session
for (i in 1:75) file(foo)
gc()
warnings()[1:3]
 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 149603  4.0 35  9.4   35  9.4
Vcells 101924  0.8 786432  6.0   486908  3.8
There were 50 or more warnings (use warnings() to see the first 50)
 warnings()[1:3]
$`closing unused connection 76 (foo)`
NULL

$`closing unused connection 75 (foo)`
NULL

$`closing unused connection 74 (foo)`
NULL


-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check recursive copy of tests/

2007-08-31 Thread Seth Falcon

Henrik Bengtsson [EMAIL PROTECTED] writes:
 intentional I'd say: I did not implement it, but it seems much
 more logical to keep the previous rule: All *.R files in
 ./tests/ are run period
 Subdirectories can be useful for organization, notably storing
 test data.  I don't think it's a good idea to use so very many test files
 that you need subdirectories, unless maybe you are thinking
 about unit tests; and then, see below.

 Examples of subdirectories (some overlapping) are:

 units/ - tests of minimal code modules
 integration/ - tests of integrating the above units
 system/ - real-world scenarios/use cases

 requirements/ - every requirement should have at least on test.
 bugs/ - every bug fix should come with a new test.
 regression/ - every update should have a regression test to validate
 backward compatibility etc.

 robustness/ - Testing the robustness of estimators against outliers as
 well as extreme parameter settings.
 validation/ - validation of numeric results compared with alternative
 implementations or summaries.

 benchmarking/ - actually more measuring time, but can involve
 validation that a method is faster than an alternative.
 crossplatform/ - validate correctness across platforms.
 torture/ - pushing the limits.

Those all seem like reasonable examples, but the fact that R CMD check
doesn't recurse really isn't a problem.  You can have a driver script
at the top-level that runs as many of the tests in subdirs as you
want.  And this is really a good thing since as you mentioned later in
your response, some tests take a long time to run and probably are
best not automatically run during R CMD check.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R CMD check: Error in function (env) : could not find function finalize

2007-08-28 Thread Seth Falcon

Hi Henrik,

Henrik Bengtsson [EMAIL PROTECTED] writes:

 Hi,

 does someone else get this error message:

 Error in function (env)  : could not find function finalize?

 I get an error when running examples in R CMD check (v2.6.0; session
 info below):
[snip]
 The error occurs in R CMD check but also when start a fresh R session
 and run, in this case, affxparser.Rcheck/affxparser-Ex.R.  It always
 occur on the same line.

So does options(error=recover) help in determining where the error is
coming from?

If you can narrow it down, gctorture may help or running the examples
under valgrind.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] .Call and to reclaim the memory by allocVector

2007-08-25 Thread Seth Falcon

Hi Yongchao,

Yongchao Ge [EMAIL PROTECTED] writes:
 Why am I storing a large dataset in the R? My program consist of two 
 parts. The first part is to get the intermediate results, the computation 
 of which takes a lot of time. The second part contains many 
 different functions to manipulate the the intermediate 
 results.

 My current solution is to save intermediate result in a temporary file, 
 but my final goal is to to save it as an R object. The memory leak in 
 .Call stops me from doing this and I'd like to know if I can have a clean 
 solution for the R package I am writing.

There are many examples of packages that use .Call to create large
objects.  I don't think there is a memory leak.

One thing that may be catching you up is that because of R's
pass-by-value semantics, you may be ending up with multiple copies of
the object on the R side during some of your operations.  I would
recommend recompiling with --enable-memory-profiling and using
tracemem() to see if you can identify places where copies of your
large object are occurring.  You can also take a look at
Rprof(memory.profile=TRUE).

+ seth


-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Overriding S4 methods in an installed package

2007-08-18 Thread Seth Falcon

Allen McIntosh [EMAIL PROTECTED] writes:

 Is it possible to override S4 methods in an installed package?
 The naive

 library(pkg)
 setMethod(foo, signature(obj = bar),
 function(obj , x, y) { new definition }
   , where=package:pkg)


 results in the error

 Error in setMethod(foo, signature(obj = bar), function(obj,  :
 the environment pkg is locked; cannot assign methods for function 
 foo

 (This is from R 2.5.1 on Fedora Core 5, if that matters)

 Background:  A colleague claims to have found an error in a package.
 He and I would prefer to do some experimentation before contacting
 the authors.  Subclassing is the correct way to do this, and I
 expect we will eventually subclass for other reasons, but I was
 wondering if an override was possible and easier.

If foo is a generic that you are calling directly, then you can
probably define it in the global environment (omit the where arg) and
test it that way.

OTOH, if foo is used by pkg internally, then it will be much easier to
simply edit the source for pkg, reinstall and test.  If you find and
fix a bug, most package maintainers will be quite happy to integrate
your fix.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] Suspected memory leak with R v.2.5.x and large matrices with dimnames set

2007-08-18 Thread Seth Falcon

Hi Peter,

Peter Waltman [EMAIL PROTECTED] writes:
Admittedly,  this  may  not be the most sophisticated memory profiling
performed,  but  when using unix's top command, I'm noticing a notable
memory leak when using R with a large matrix that has dimnames
set.

I'm not sure I understand what you are reporting.  One thing to keep
in mind is that how memory released by R is handled is OS dependent
and one will often observe that after R frees some memory, the OS does
not report that amount as now free.

Is what you are observing preventing you from getting things done, or
just a concern that there is a leak that needs fixing?  It is worth
noting that the internal handling of character vectors has changed in
R-devel and so IMO testing there would make sense before persuing this
further, I suspect your results will be different.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] package dependencies

2007-08-15 Thread Seth Falcon

Zhenhuan Cui [EMAIL PROTECTED] writes:

 I created an add-on R package. In this package, there is a line 
 require(pckgname), because I need to call some functions in pckgname. My 
 package is successfully built and can be successful installed. But R CMD 
 check can not be executed. The error message is:

Instead of require(pkgname), simply list pkgname in the Depends field
of your package's DESCRIPTION file.  See the Writing R Extensions
manual for details.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Convert multiple C strings into an R character vector

2007-08-10 Thread Seth Falcon

Jonathan Zhou [EMAIL PROTECTED] writes:

 I was hoping someone could tell me how to convert multiple C character
 strings into an R character vector.  

Here's a quick untested sketch:


char **yourStrings;
int numStrings = /* the length of yourStrings */;
int i;
SEXP cvect;

PROTECT(cvect = allocVector(STRSXP, numStrings));
for (i = 0; i  numStrings; i++) {
SET_STRING_ELT(cvect, i, mkChar(yourStrings[i]));
}
UNPROTECT(cvect);
return cvect;


+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Using R_MakeExternalPtr

2007-07-25 Thread Seth Falcon

Jonathan Zhou [EMAIL PROTECTED] writes:

 Hi all, 

 I've been writing a package and I've run into a problem that I'm unsure how
 to solve.  I am looking to pass a C++ class object to R so that it may be
 passed back to another C++ function later on to be used.  I'm quite new to R
 and this is my first time writing a package, so I hope you can bear with me.  

 The following is how I create the class and use R_MakeExternalPtr().  This
 occurs in a function called soamInit: 
 Session* sesPtr = conPtr-createSession(attributes);
 void* temp = session;

It isn't clear from your example, are you sure that temp is valid at
this point?

 SEXP out = R_MakeExternalPtr(temp, R_NilValue, R_NilValue);

I was expecting to see:

  SEXP out = R_MakeExternalPtr((void *)sesPtr, R_NilValue, R_NilValue);

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] X11() dies in remote background

2007-07-21 Thread Seth Falcon

Vincent Carey 525-2265 [EMAIL PROTECTED] writes:

 this is not a problem with R but a request for related advice.

 i am trying to run a lengthy batch job from my home.

 the OS is ...
 Linux jedi.bwh.harvard.edu 2.4.22-openmosix1smp #1 SMP Fri Sep 5 01:05:37 CEST
 2003 i686 athlon i386 GNU/Linux

 i start the job and put it in the background.  while i am connected, all is
 well.  eventually my ISP shuts down the connection if i do not do any
 input.  

One thing you might try is using screen.  The screen program lets you
multiplex terminals in a single window, but the feature you want here
is that it allows you to detach and reattach to a session.  So you
could start a screen session at work or home, start something running,
detach, and then come back later and attach to see how things are going.

However, screen may further complicate your desire to use X11(), but
perhaps with Xvfb run from the screen session things will work.  Do
all of the graphics devices require access to X11()?  I thought you
could use pdf() for example, without X11() but I'm not certain.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] dict package: dictionary data structure for R

2007-07-21 Thread Seth Falcon

Hi all,

The dict package provides a dictionary (hashtable) data
structure much like R's built-in environment objects, but with the
following differences:

  - The Dict class can be subclassed.

  - Four different hashing functions are implemented and the user can
specify which to use when creating an instance.

I'm sending this here as opposed to R-packages because this package
will only be of interest to developers and because I'd like to get
feedback from a slightly smaller community before either putting it on
CRAN or retiring it to /dev/null.

The design makes it fairly easy to add additional hashing functions,
although currently this must be done in C.  If nothing else, this
package should be useful for evaluating hashing functions (see the
vignette for some examples).

Source:
  R-2.6.x: http://userprimary.net/software/dict_0.1.0.tar.gz
  R-2.5.x: http://userprimary.net/software/dict_0.0.4.tar.gz

Windows binary:
  R-2.5.x: http://userprimary.net/software/dict_0.0.4.zip


+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] dict package: dictionary data structure for R

2007-07-21 Thread Seth Falcon

Gabor Grothendieck [EMAIL PROTECTED] writes:

 Although the proto package is not particularly aimed at hashing note
 that it covers some of the same ground and also is based on a well
 thought out object model (known as object-based programming
 or prototype programming).

Interesting.  The dict package differs from proto in that it _is_
aimed at hashing and:

  - It is S4 based

  - It does not use R's environment objects to implement its
hashtables (proto uses environments).

In Bioconductor, we have many hashtables where the key is an
Affymetrix probeset ID.  These look sort of like 1000_at.  It turns
out that the algorithm used by R's environments is not very good at
hashing these values.  The dict package lets you investigate this:

   library(dict)
   keys2 = paste(seq(1000, length=13000), at, sep=_)

   # here, hash.alg=0L corresponds to the hashing function used by R's
   # environments.  I know, a name would be better.
summary(as.integer(table(hashCodes(keys=keys2, hash.alg=0L, size=2^14
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
80011001500162520252700 
   # hash.alg=1L is djb2 from here: http://www.cse.yorku.ca/~oz/hash.html 
summary(as.integer(table(hashCodes(keys=keys2, hash.alg=1L, size=2^14
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  1.000   1.000   2.000   1.648   2.000   4.000 

  # and this is what we see with an environment:
 e = new.env(hash=T, size=2^14)
 for (k in keys2) e[[k]] = k
 summary(env.profile(e)$counts)
 Min.   1st Qu.Median  Mean   3rd Qu.  Max. 
   0.0.0.0.79350. 2700. 



-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 coerce

2007-07-16 Thread Seth Falcon

Paul Gilbert [EMAIL PROTECTED] writes:

 (I am not sure if this is a bug or a request for a more understandable 
 warning, or possible something obvious I should be posting on r-help.)

 I am trying to coerce an new class object to be a DBIConnection and it 
 does not work the way I think it should:

 R version 2.5.1 (2007-06-27) ...
   require(RMySQL) # or require(RSQLite)
 Loading required package: RMySQL
 Loading required package: DBI
 [1] TRUE
   m - dbDriver(MySQL)  # or  m - dbDriver(SQLite)
   con - dbConnect(m, dbname=test)
   dbGetQuery(con, create table zzz (
 +vintage VARCHAR(20) NOT NULL,
 +alias   VARCHAR(20) default NULL,
 +Documentation TEXT,
 +PRIMARY KEY (vintage)
 +);)
 NULL
   dbListTables(con)
   [1] zzz
   setClass(TSconnection, representation(con=DBIConnection,
 +vintage = logical,
 +panel   = logical)
 +)
 [1] TSconnection
   setAs(TSconnection, DBIConnection, def = function(from) [EMAIL 
 PROTECTED])

I think things work as you expect up until this pint.

   setIs(TSconnection, DBIConnection, coerce = function(x)
   [EMAIL PROTECTED])

I'm confused about what you want to do here.  If you want TSconnection
to be a DBIConnection, why wouldn't you use inheritance?

   setClass(TSconnection, contains=DBIConnection, ...)

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Getting param names of primitives

2007-07-11 Thread Seth Falcon

Prof Brian Ripley [EMAIL PROTECTED] writes:

 My problem is that if we make formals() work on primitives, people will 
 expect

 formals(log) - value

 to work, and it cannot.

But it could give an informative error message.  Asking for formals()
seems to make sense so making it work seems like a good idea.  I'll
agree that it working might encourage someone to try formals-(), but
the fact that it cannot do anything but error seems like a strange
reason not to make formals() work.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] step() in sink() and Sweave()

2007-06-13 Thread Seth Falcon

Jari Oksanen [EMAIL PROTECTED] writes:

 On Wed, 2007-06-13 at 13:33 +0100, Gavin Simpson wrote:
 Dear Developers,
 
 This change also just bit me whilst updating Sweave documents for some
 computer classes.
 
 Is there a work-around that can be employed so that we get both the
 message() bits and the print() bits in the same place for our Sweave
 files?
 
 If not, is there any point in filing this as a bug in R? I see there
 have been no (public) responses to Jari's email, yet the change is
 rather annoying, and I do not see the rationale for printing different
 parts of the output from step() in two different ways.
 
 I think this is a bug. You should not use message() with optional trace.
 The template for the usage in step() is first

 if (trace) message()

 and later

 if (trace) print()

 If you specifically request printing setting  trace = TRUE, then you
 should not get message().

 Interestingly, message() seems to be a warning() that cannot be
 suppressed by setting options.

message is a condition and so is a warning.  This means you have some
control over them.  For example, you can create a wrapper for step
that uses withCallingHandlers to cat out all messages (or print them,
or email them to your friends :-)

mystep - function(object, scope, scale = 0,
   direction = c(both, backward, forward),
   trace = 1, keep = NULL, steps = 1000, k = 2, 
   ...)
{
withCallingHandlers(step(object=object, scope=scope, scale=scale,
 direction=direction, trace=trace,
 keep=keep, steps=steps, k=k, ...),
message=function(m) {
cat(conditionMessage(m))
})
}

 This is so annoying that I haven't updated some of my Sweave documents.
 It is better to have outdated documents than crippled documents.

I'm not trying to argue that the function shouldn't change, but if it
is so annoying, you can also resolve this problem by defining your own
step function and calling it (forgetting about withCallingHandlers).
Clearly not ideal, but at the same time in the spirit of open source,
no?

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] PATCH: install inst/ before doing lazyload on Windows

2007-06-13 Thread Seth Falcon

Seth Falcon [EMAIL PROTECTED] writes:
 On Windows, package files in the inst/ subdir are installed after the
 lazyload creation.  This differs from Linux where inst/ is installed
 _before_ lazyload creation.

 Since packages may need data in inst, I think the order on Windows
 should be changed.  Perhaps like this:

This has been fixed in R devel and patched.

Thanks!

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] PATCH: install inst/ before doing lazyload on Windows

2007-06-12 Thread Seth Falcon

Hi,

On Windows, package files in the inst/ subdir are installed after the
lazyload creation.  This differs from Linux where inst/ is installed
_before_ lazyload creation.

Since packages may need data in inst, I think the order on Windows
should be changed.  Perhaps like this:

diff --git a/src/gnuwin32/MakePkg b/src/gnuwin32/MakePkg
index 57af321..868e8f1 100644
--- a/src/gnuwin32/MakePkg
+++ b/src/gnuwin32/MakePkg
@@ -74,10 +74,10 @@ all:
@$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s 
nmspace
@$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg Dynlib 
@$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s R
+   @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s 
$(DPKG)/demo $(DPKG)/exec $(DPKG)/inst $(DATA)
 ifeq ($(strip $(LAZY)),true)
@$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s 
lazyload
 endif
-   @$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s 
$(DPKG)/demo $(DPKG)/exec $(DPKG)/inst $(DATA)
 ifeq ($(strip $(LAZYDATA)),true)
@$(MAKE) --no-print-directory -f $(RHOME)/src/gnuwin32/MakePkg -s 
lazydata
 endif

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] package check note: no visible global function definition (in functions using Tcl/Tk)

2007-06-11 Thread Seth Falcon

Prof Brian Ripley [EMAIL PROTECTED] writes:

 It seems that is happens if package tcltk is missing from the Depends: 
 list in the DESCRIPTION file.  I just tested with Amelia and homals and 
 that solved the various warnings in both cases.

Adding tcltk to Depends may not always be the desried solution.  If
tcltk is already in Suggests, for example, and the intention is to
optionally provide GUI features, then the code may be correct as-is.
That is, codetools will issue the NOTEs if you have a function that
looks like:

   f - function() {
 if (require(tckltk)) {
 someTckltkFunctionHere()
 } else
 otherwiseFunction()
 }
   }

There are a number of packages in the BioC repository that provide
such optional features (not just for tcltk) and it would be nice to
have a way of declaring the use such that the NOTE is silenced.

[Note 1: I don't have any ideas at the moment for how this could
work.]

[Note 2: Despite the false-positives, I've already caught a handful of
bugs by reading over these NOTEs and think they provide a lot of value
to the check process]

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] missing IntegerFromString()

2007-06-08 Thread Seth Falcon

Aniko Szabo [EMAIL PROTECTED] writes:

 I am sorry about the confusion, I was too hasty.
 asInteger(coerceVector(x,INTSXP)) does not work after all. Here are more
 details of what I am trying to accomplish: I have a matrix with column
 names that are actually known to be integers (because I set them so
 myself in the R code, say, colnames(mat) - 1:10. Of course, they become
 converted to character strings.)

 The relevant part of my code used to be:

 SEXP MyFunction(SEXP mat);
   int warn, minY 
   SEXP rl, cl;
   char *rn, *cn;
   GetMatrixDimnames(mat, rl, cl, rn, cn);
   minY = IntegerFromString(VECTOR_ELT(cl,0), warn);
   if (warn  0) error(Names of popmatrix columns are not
 integers);

 Running some tests it appears that VECTOR_ELT(cl,0) is CHARSXP (which I
 wound up using without even knowing it).
 I tried replacing the IntegerFromString part with both
 asInteger(VECTOR_ELT(cl,0)) and with
 asInteger(coerceVector(VECTOR_ELT(cl,0),INTSXP)), but as you surmised,
 since VECTOR_ELT(cl,0) is CHARSXP, it does not work.

 So, how could I get the actual values in the column names?

How about:

  SEXP colnums;
  int *ivals;
  PROTECT(colnums = coerceVector(cl, INTSXP));
  ivals = INTEGER(colnums);

Here you convert the STRSXP cl into an INTSXP.  If you want the actual
integer values, use the ivals pointer.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] HTML vignette browser

2007-06-04 Thread Seth Falcon

Friedrich Leisch [EMAIL PROTECTED] writes:
 Looks good to me, and certainly something worth being added to R.

 2 quick (related) comments:

 1) I am not sure if we want to include links to the Latex-Sources by
default, those might confuse unsuspecting novices a lot. Perhaps
make those optional using an argument to browseVignettes(), which
is FALSE by default?

I agree that the Rnw could confuse folks.  But I'm not sure it needs
to be hidden or turned off by default...  If the .R file was also
included then it would be less confusing I suspect as the curious
could deduce what Rnw is about by triangulation.

 2) Instead links to .Rnw files we may want to include links to the R
code - should we R CMD INSTALL a tangled version of each vignette
such that we can link to it? Of course it is redundant information
given the .Rnw, but we also have the help pages in several formats
ready.

Including, by default, links to the tangled .R code seems like a
really nice idea.  I think a lot of users who find vignettes don't
realize that all of the code used to generate the entire document is
available to them -- I just had a question from someone who wanted to
know how to make a plot that appeared in a vignette, for example.


+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible changes to connections

2007-05-31 Thread Seth Falcon


mel writes:
 There could be/was the same debate in C/C++.
 That's may be just a matter of education about not forgetting
 to close previously opened doors !

R is not C/C++.  In general, one does not expect to explicitly handle
memory allocation and release when programming in R.  Treating
connections differently, when there is no longer any technical reason
to do so, is surprising.

Prof Brian Ripley [EMAIL PROTECTED] writes:
 When I ran some tests I found 7 packages on CRAN that in their tests
 were not closing connections.  Four of those are maintained by R-core
 members.
 Even though none were by me, I think this is too easy to forget to
 do!

I agree that it is easy to forget.  It is especially easy if one
creates so-called anonymous connection references like
readLines(file(path)) -- this anonymous idiom seems nature to me when
coding R and it would be nice to make it work for connections.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible changes to connections

2007-05-31 Thread Seth Falcon

Hi,

One more comment on this thread...

Jeffrey Horner [EMAIL PROTECTED] writes:

 Prof Brian Ripley wrote:
 When I originally implemented connections in R 1.2.0, I followed the model 
 in the 'Green Book' closely.  There were a number of features that forced 
 a particular implementation, and one was getConnection() that allows one 
 to recreate a connection object from a number.
 [...]
 Another issue is that the current connection objects can be saved and 
 restored but refer to a global table that is session-specific so they lose 
 their meaning (and perhaps gain an unintended one).
 
 What I suspect is that very few users are aware of the Green Book 
 description and so we have freedom to make some substantial changes
 to the implementation.  Both issues suggest that connection objects should 
 be based on external pointers (which did not exist way back in 1.2.0).

 Sounds great! I would also like to see the following interface (all or 
 in parts) added for working with connections from C. This is an update 
 to the patch I created here:

 http://wiki.r-project.org/rwiki/doku.php?id=developers:r_connections_api

I wanted to voice a me too for wanting to see an interface added for
working with connections from C in package code.  There are a number
of places where this would be useful and provide cleaner solution than
what is possible today.

The proposed interface looks useful.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 assignment \alias and \usage

2007-05-30 Thread Seth Falcon

Paul Gilbert [EMAIL PROTECTED] writes:

 What is the Rd file alias and usage syntax for an S4 assignment method? 
I have been trying variations on

 \alias{TSdoc-,default-method}

 \usage{
  \S4method{TSdoc}{default}(x) - value

 but so far I have not got it right according to various codoc, etc,
 checks.

If you have your own generic TSdoc-, then I think you want:

\alias{TSdoc-}
\alias{TSdoc-,someClass,anotherClass-method}

You may not be allowed to specify usage, but I think the issue only
arises when setting methods for a generic documented elsewhere.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 assignment \alias and \usage

2007-05-30 Thread Seth Falcon

Paul Gilbert [EMAIL PROTECTED] writes:

 Let me back up a bit, I may be making another mistake.  My code has

 setGeneric(TSdoc-,
def= function(x, value) standardGeneric(TSdoc-),
useAsDefault= function (x, value) {attr(x, TSdoc) - value ; x })

 setGeneric(TSdoc,
def= function(x) standardGeneric(TSdoc),
useAsDefault= function(x) attr(x, TSdoc))

Aside:

It seems odd to me to define such defaults.  How do you know x is
going to have a TSdoc attribute?  

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible changes to connections

2007-05-30 Thread Seth Falcon

Prof Brian Ripley [EMAIL PROTECTED] writes:
 When I originally implemented connections in R 1.2.0, I followed the model 
 in the 'Green Book' closely.  There were a number of features that forced 
 a particular implementation, and one was getConnection() that allows one 
 to recreate a connection object from a number.

 I am wondering if anyone makes use of this, and if so for what?

I don't see any uses of it in the Bioconductor package sources.

 It would seem closer to the R philosophy to have connection objects that 
 get garbage collected when no R object refers to them.  This would allow 
 for example

 readLines(gzfile(foo.gz))

I think this would be a nice improvement as it matches what many
people already assume happens as well as matches what some other
languages do (in particular, Python).

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Recent changes in R related to CHARSXPs

2007-05-25 Thread Seth Falcon

Hello all,

I want to highlight a recent change in R-devel to the larger developeR
community.  As of r41495, R maintains a global cache of CHARSXPs such
that each unique string is stored only once in memory.  For many
common use cases, such as dimnames of matrices and keys in
environments, the result is a significant savings in memory (and time
under some circumstances).

A result of these changes is that CHARSXPs must be treated as read
only objects and must never be modified in-place by assigning to the
char* returned by CHAR().  If you maintain a package that manipulates
CHARSXPs, you should check to see if you make such in-place
modifications.  If you do, the general solution is as follows:

   If you need a temp char buffer, you can allocate one with a new
   helper macro like this:

 /* CallocCharBuf takes care of the +1 for the \0,
so the size argument is the length of your string.
 */
 char *tmp = CallocCharBuf(n);

 /* manipulate tmp */
 SEXP schar = mkChar(tmp);
 Free(tmp);

   You can also use R_alloc which has the advantage of not having to
   free it in a .Call function.

The mkChar function now consults the global CHARSXP cache and will
return an already existing CHARSXP if one with a matching string
exists.  Otherwise, it will create a new one and add it to the cache
before returning it.  

In a discussion with Herve Pages, he suggested that the return type of
CHAR(), at least for package code, be modified from (char *) to (const
char *).  I think this is an excellent suggestion because it will
allow the compiler to alert us to package C code that might be
modifying CHARSXPs in-place.  This hasn't happened yet, but I'm hoping
that a patch for this will be applied soon (unless better suggestions
for improvement arise through this discussion :-)

One other thing is worth mentioning: at present, not all CHARSXPs are
captured by the cache.  I think the goal is to refine things so that
all CHARSXPs _are_ in the cache.  At that point, strcmp calls can be
replaced with pointer comparisons which should provide some nice
speed ups.  So part of the idea is that the way to get CHARSXPs is via
mkChar or mkString and that one should not use allocString, etc.

Finally, here is a comparison of time and memory for loading all the
environments (hash tables) in Bioconductor's GO annotation data
package.

## unpatched

 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 168891  9.1 35 18.7   35 18.7
Vcells 115731  0.9 786432  6.0   425918  3.3
 library(GO)
 system.time(for (e in ls(2)) get(e))
   user  system elapsed
 51.919   1.168  53.228
 gc()
   used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells 17879072 954.9   19658017 1049.9 18683826 997.9
Vcells 31702823 241.9   75190268  573.7 53912452 411.4


## patched

 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 154717  8.3 35 18.7   35 18.7
Vcells 133613  1.1 786432  6.0   483138  3.7
 library(GO)
 system.time(for (e in ls(2)) get(e))
   user  system elapsed
 31.166   0.736  31.998
 gc()
   used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  5837253 311.86910418 369.1  6193578 330.8
Vcells 16831859 128.5   45712717 348.8 39456690 301.1

Best Wishes,

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] relist, an inverse operator to unlist

2007-05-23 Thread Seth Falcon

Andrew Clausen [EMAIL PROTECTED] writes:

 Hi Seth,

 On Mon, May 21, 2007 at 05:15:10PM -0700, Seth Falcon wrote:
 I will also add that the notion of a default argument on a generic
 function seems a bit odd to me.  If an argument is available for
 dispatch, I just don't see what sense it makes to have a default.  In
 those cases, the default should be handled by the method that has a
 signature with said argument matching the missing class.
 
 What often does make sense is to define a generic function where some
 argument are not available for dispatch.  For example:
 
 setGeneric(foo, signature=flesh,
function(flesh, skeleton=attr(flesh, skeleton) 
standardGeneric(foo)))

 That's an excellent suggestion.  Thanks!  However, I had to set the signature
 to c(numeric, missing) rather than just numeric.

 I have uploaded a new version here:

   http://www.econ.upenn.edu/~clausen/computing/relist.R

I misunderstood.  You aren't using S4 classes/methods at all
and so I don't actually see how my comments could have been helpful in
any way.  relist seems like a really odd solution to me, but based on
the discussion I guess it has its use cases.

Best,

+ seth



-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-22 Thread Seth Falcon

Hi again,

Robert has committed the proposed patch to R-devel.  So [[ now has an
'exact' argument and the behavior is as described:

Seth Falcon [EMAIL PROTECTED] writes:
1. [[ gains an 'exact' argument with default value NA

2. Behavior of 'exact' argument:

   exact=NA
   partial matching is performed as usual, however, a warning
   will be issued when a partial match occurs.  This is the
   default.

   exact=TRUE
   no partial matching is performed.

   exact=FALSE
   partial matching is allowed and no warning issued if it
   occurs.


+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Passing R CMD Check without data

2007-05-21 Thread Seth Falcon

Arjun Ravi Narayan [EMAIL PROTECTED] writes:

 I have a package which passes R CMD check with the --no-vignettes option.
 However, it does not pass the check without that, as the vignette relies on
 some data files that I cannot distribute. However, I would like the package
 to pass the check so that I can put it on CRAN, so that other people with
 access to the dataset can put the data into the package, and then rebuild
 the vignettes themselves.

I would recommend having a separate vignette that uses toy data, if
that is all that is available, that demonstrates the basic use of
the package.  A consider part of the value of a package vignette,
IMHO, is having something that (i) the user can run interactively on
their own, and (ii) can be automatically checked.

Your current vignette can be included as pdf (the Rnw could live in
another place under inst/).

You might also look at the vsn package in Bioconductor which uses a
Makefile to avoid R CMD check from building its vignette because it is
too time consuming...

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] relist, an inverse operator to unlist

2007-05-21 Thread Seth Falcon

Hi Andrew,

Andrew Clausen [EMAIL PROTECTED] writes:
 For reasons I can't explain, the code I posted worked in my session, but 
 didn't
 work when I started a fresh one.  standardGeneric() seems to get confused
 by defaults for missing arguments.  It looks for a missing method with
 this code:

   relist - function(flesh, skeleton=attr(flesh, skeleton))
   {
   standardGeneric(relist)
   }

This looks very odd to me.  If you are creating an S4 generic
function, why are you not calling setGeneric?  Or has that part of the
code simply been omitted from your post?

I will also add that the notion of a default argument on a generic
function seems a bit odd to me.  If an argument is available for
dispatch, I just don't see what sense it makes to have a default.  In
those cases, the default should be handled by the method that has a
signature with said argument matching the missing class.

What often does make sense is to define a generic function where some
argument are not available for dispatch.  For example:

setGeneric(foo, signature=flesh,
   function(flesh, skeleton=attr(flesh, skeleton) 
   standardGeneric(foo)))


+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-17 Thread Seth Falcon

Bill Dunlap [EMAIL PROTECTED] writes:
 This sounds interesting.  Do you intend to leave the $
 operator alone, so it will continue to do partial
 matching?  I suspect that that is where the majority
 of partial matching for list names is done.

The current proposal will not touch $.  I agree that most intentional
partial matching uses $ (hopefully only during interactive sessions).
The main benefit of the our proposed change is more reliable package
code.  For long lists and certain patterns of use, there are also
performance benefits:

 kk - paste(abc, 1:(1e6), sep=)
 vv = as.list(1:(1e6))
 names(vv) = kk

 system.time(vv[[fooo, exact=FALSE]])
   user  system elapsed 
  0.074   0.000   0.074 

 system.time(vv[[fooo, exact=TRUE]])
   user  system elapsed 
  0.042   0.000   0.042 


 It might be nice to have an option that made x$partial warn so we
 would fix code that relied on partial matching, but that is lower
 priority.

I think that could be useful as well.  To digress a bit further in
discussing $... I think the argument that partial matching is
desirable because it saves typing during interactive sessions now has
a lot less weight.  The recent integration of the completion code
gives less typing and complete names.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: allow packages to advertise vignettes on Windows

2007-05-16 Thread Seth Falcon

Simon Urbanek [EMAIL PROTECTED] writes:

 Seth,

 we already *have* vignette registration in place [vignette()] and we
 already *have* support in the GUIs (I'm talking e.g. about the Mac
 GUI here which uses vignette() to build a vignettes browser).

Yes, fine.  I agree that vignette() provides most of
what is needed in terms of the implementationn details and could
replace most of the code that I posted, but it doesn't
mean there is nothing to do.

 What you propose circumvents all the mechanisms already in place and
 adds replicates the same functionality. I'll repeat my question: what
 is wrong with the current approach? Why do you want to add a parallel
 approach?

What is wrong with the current approach is that, at least on Windows,
vignettes are not as easily accessible they should be.  vignette() is
fine as an implementation detail for GUI developers.  It is a bit
silly for beginning users who will have a much better chance of
getting to such introductory documentation if it is part of the GUI.
Gaah, I feel like a broken record.

What we have had with the code in Biobase is a menu of vignettes for
_attached_ packages.  Given the total number of packages that could be
installed and given the fact that running code in a vignette requires
said package to be attached, I think this makes a lot of sense [And I
think this would improve the usability of the OS X vignette browser
because the list is long, the vignettes for an individual package are
not sensibly ordered, etc].  A menu is not perfect, but limiting to
attached packages makes it a useful solution until more robust
browsers etc get to the top of someones TODO list.  But YMMV and what
I've proposed does not require your OS X GUI to change _anything_.

So, as a small step, I'm trying to get vignettes for attached packages
to be easily accessible via the Windows GUI.  I don't care all that
much about the particulars -- and am certainly not attached to the
code that I posted.

What the vignette() function does not provide for is a hook such that
a GUI can add the vignette info for attached packages.  Comments from
others in this thread suggest that there is a desire that this be an
opt-in feature for package authors [I don't really understand this
desire as it seems to me it should be a feature/decision of the GUI]
and again vignette() doesn't help.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] RFC: allow packages to advertise vignettes on Windows

2007-05-14 Thread Seth Falcon

Hello,

The vignette concept, which started in Bioconductor, seems to be
catching on.  They are supported by R CMD build/check and documented
in the Writing R Extensions manual.  I think vignettes are a fantastic
way to introduce new users to a package.  However, getting new users
to realize that a vignette is available can be challenging.

For some time now, we have had a function in Biobase that creates a
Vignettes menu item in the R Windows GUI and gives packages a
mechanism to register their vignettes so that they appear on this
menu.  I would like to see this functionality included in R so that
there can be a standard mechanism that doesn't depend on Biobase of
registering a package's vignettes with one of the R GUIs (currently
only Windows is supported, but I imagine the OS X GUI could also
implement this).

Below is the implementation we have been using.  Is there an R-core
member I can interest in pushing this along?  I'm willing to submit a
patch with documentation, etc.

+ seth

addVigs2WinMenu - function(pkgName) {
if ((.Platform$OS.type == windows)  (.Platform$GUI == Rgui)
 interactive()) {
vigFile - system.file(Meta, vignette.rds, package=pkgName)
if (!file.exists(vigFile)) {
warning(sprintf(%s contains no vignette, nothing is added to the 
menu bar, pkgName))
} else {
vigMtrx - .readRDS(vigFile)
vigs - file.path(.find.package(pkgName), doc, vigMtrx[,PDF])
names(vigs) - vigMtrx[,Title]

if (!Vignettes %in% winMenuNames())
  winMenuAdd(Vignettes)
pkgMenu - paste(Vignettes, pkgName, sep=/)
winMenuAdd(pkgMenu)
for (i in vigs) {
item - sub(.pdf, , basename(i))
winMenuAddItem(pkgMenu, item, paste(shell.exec(\, 
as.character(i), \), sep = ))
}
} ## else
ans - TRUE
} else {
ans - FALSE
}
ans
}




-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: allow packages to advertise vignettes on Windows

2007-05-14 Thread Seth Falcon

Duncan Murdoch [EMAIL PROTECTED] writes:
 I'm interested in making vignettes more visible.  Putting them on the
 menu is not the only way, but since you're offering to do the work, I
 think it's a good idea :-).

Excellent :-)

 A few questions:

  - Should packages need to take any action to register their
 vignettes, or should this happen automatically for anything that the
 vignette() function would recognize as a vignette?

 My recommendation would be for automatic installation.

That seems ok to me.  Currently, we have a system that requires
package authors to register their vignette in .onAttach (more on that
below).  I can't really think of a case where a package provides
vignettes and doesn't want them easily accessible to new users in a
GUI environment.

  - Should it happen when the package is installed or when it is
attached?

 This is harder.  vignette() detects installed vignettes, which is fine
 if not many packages have them.  But I think the hope is that most
 packages will eventually, and then I think you wouldn't want the menu
 to list every package.  Maybe default to attached packages, but expose
 the function below for people who want more?

My feeling is that this is only appropriate for attached packages.  As
you point out, adding an entry for every installed package could
create a cluttered menu (and present implementation challenges to
avoid slowness).  I also think that packages that get loaded via other
packages name spaces should remain in stealth mode.  

There is another reason to only list vignettes for attached packages.
One of the primary uses of a vignette is to allow the user to work
through an example use case interactively.  This requires the package
to be attached in almost all cases.

  - Should they appear in a top level Vignettes menu, or as a submenu
 of the Help menu?

 I'd lean towards keeping the top level placement, since you've already
 got an audience who are used to that.

Sounds good.

 By the way, another way to expose vignettes is to have them
 automatically added to the package help topic, with links in formats
 that support them.  I think we should do that too, but I don't know if
 it'll happen soon.

Also sounds good, but one thing at a time, I guess.

If there is some agreement about vignettes being automatically added
and that this only happens when a package is attached, then I can look
into modifying the existing function to handle this.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Problem calling $ inside a $ method

2007-05-07 Thread Seth Falcon

Hello,

I wonder if this will make it through the spam filters given the
subject line.

I'm seeing the following when trying to call a dollar method inside of
a dollar method.


setClass(Foo, representation(d=list))
[1] Foo

f - new(Foo, d=list(bob=1, alice=2))

## We can call dollar at this level and it works as expected

`$`(f, bo)
[1] 1

`$`(f, al)
[1] 2

## So set a method on Foo that does this

setMethod($, Foo, function(x, name) `$`([EMAIL PROTECTED], name))
[1] $

## But it doesn't work.  Why?

f$bo
NULL

f$al
NULL

## Here is a hackish workaround.

setMethod($, Foo, function(x, name)
  eval(substitute([EMAIL PROTECTED], list(FOO=name
[1] $

f$bo
[1] 1

f$al
[1] 2

Other suggestions for workarounds?  Is this a bug?

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible problem with S4 dispatch

2007-05-01 Thread Seth Falcon

Prof Brian Ripley [EMAIL PROTECTED] writes:
 Note that you called

 selectMethod(mget, signature(x=character, envir=class(LLe)))

 by name rather than calling the visible function mget() (which you
 could have supplied as fdef).  I've never really got to the bottom of
 the complicated searches that getGeneric() uses, but the fact that it
 does not just look for a visible function of that name tells you it is
 doing something different.

 What I would check from your browser is what parent.env() shows,
 successively until you get to the imports and then the base
 namespace. If mget is not in the imports, something would seem to be
 up with your importing of namespaces.  find() is not relevant here as
 namespace scoping is in play: only if the mget generic is imported
 will it take precedence over base:::mget.  (It is not clear to me what
 is being browsed here, and hence what namespaces are in play.)

This was helpful.  It seems that the strange behavior I was seeing was
due to stale package installations.  After reinstalling the package
and all of its depends and imports, things are looking more normal.

I used the following function to examine the chain of parent
environments while debugging:

showEncEnvs - function() {
etmp - parent.env(parent.frame())
while (TRUE) {
ename - environmentName(etmp)
cat(sprintf(Found envirnment: '%s'\n, ename))
if (exists(mget, etmp, inherits=FALSE))
  cat(found mget\n)
switch(ename,
   R_EmptyEnv=break,
   R_GlobalEnv=break)
if (ename == ) {
cat( first five entires\n)
print(ls(etmp)[1:5])
}
etmp - parent.env(etmp)
}
}

One thing to note: One might expect each import to be in the chain of
parent environments.  Instead all imports are merged into a single
environment that is the parent of the package env.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 2 3 >

1 - 100 of 280 matches

Mail list logo