Re: [Rd] read.csv

2024-04-27 Thread Kevin Coombes
I was horrified when I saw John Weinstein's article about Excel turning
gene names into dates. Mainly because I had been complaining about that
phenomenon for years, and it never remotely occurred to me that you could
get a publication out of it.

I eventually rectified the situation by publishing "Blasted Cell Line
Names", describing how to match different researchers' recording of the
names of cell lines, by applying techniques for DNA or protein sequence
alignment.

Best,
   Kevin

On Tue, Apr 16, 2024, 4:51 PM Reed A. Cartwright 
wrote:

> Gene names being misinterpreted by spreadsheet software (read.csv is
> no different) is a classic issue in bioinformatics. It seems like
> every practitioner ends up encountering this issue in due time. E.g.
>
> https://pubmed.ncbi.nlm.nih.gov/15214961/
>
> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
>
> https://www.nature.com/articles/d41586-021-02211-4
>
>
> https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates
>
>
> On Tue, Apr 16, 2024 at 3:46 AM jing hua zhao 
> wrote:
> >
> > Dear R-developers,
> >
> > I came to a somewhat unexpected behaviour of read.csv() which is trivial
> but worthwhile to note -- my data involves a protein named "1433E" but to
> save space I drop the quote so it becomes,
> >
> > Gene,SNP,prot,log10p
> > YWHAE,13:62129097_C_T,1433E,7.35
> > YWHAE,4:72617557_T_TA,1433E,7.73
> >
> > Both read.cv() and readr::read_csv() consider prot(ein) name as
> (possibly confused by scientific notation) numeric 1433 which only alerts
> me when I tried to combine data,
> >
> > all_data <- data.frame()
> > for (protein in proteins[1:7])
> > {
> >cat(protein,":\n")
> >f <- paste0(protein,".csv")
> >if(file.exists(f))
> >{
> >  p <- read.csv(f)
> >  print(p)
> >  if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
> >}
> > }
> >
> > proteins[1:7]
> > [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
> >
> > dplyr::bind_rows() failed to work due to incompatible types nevertheless
> rbind() went ahead without warnings.
> >
> > Best wishes,
> >
> >
> > Jing Hua
> >
> > __
> > R-devel@r-project.org mailing list
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!YJzURlAK1O3rlvXvq9xl99aUaYL5iKm9gnN5RBi-WJtWa5IEtodN3vaN9pCvRTZA23dZyfrVD7X8nlYUk7S1AK893A$
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An interesting chat with ChatGPT

2023-02-13 Thread Kevin Coombes
Chat bots are like politicians, or talking dogs. The fact that they exist
is interesting. But no same person would believe anything they say.

On Mon, Feb 13, 2023, 10:58 AM Boris Steipe 
wrote:

> Duncan -
>
> Indeed, this has now been well documented; I have called these
> constructions "Schrödinger Facts", since they arise from a superposition of
> truths in the training data that collapse into an untruth when observed.
>
>https://sentientsyllabus.substack.com/p/chatgpts-achilles-heel
>
> Now, the curious question is: why can it program. Why is its production of
> R-syntax less vulnerable to this effect than a literature quote, or
> reference? Maybe it has to do with the fact that in producing correct
> syntax there is a ground truth - errors just will not run. But I'm not sure.
>
> Regardless: it signals that we need a reinvigorated culture of validation.
> Actually ChatGPT will happily make test data for you. Whisvh. in a way, it
> judt did  ;-)
>
>
> Boris
>
>
>
>
> > On 2023-02-13, at 08:37, Duncan Murdoch 
> wrote:
> >
> > I was wondering which anchor would match in the regular expression
> "(a)|(b)" if both "a" and "b" matches the target string.  What appears to
> happen is that the first match wins, and that's how it is documented in
> Python, but I was looking for docs on this in R.
> >
> > As far as I can see, the ?regex help page doesn't resolve this. So just
> for fun, I decided to ask ChatGPT, and got an interesting response.  I
> won't show the whole conversation here, but just some extracts:
> >
> > ---
> > Me:  In a regular expression in R, if both "a" and "b" match, which does
> "(a)|(b)" match?  Please include authoritative references in your answer.
> >
> > ChatGPT: [long answer about grep]
> >
> > Me:  In grep(), it doesn't matter which of a or b matches.  It's only in
> sub() or gsub() that it makes a difference.  Do you know an authoritative
> reference that discusses those functions?
> >
> > ChatGPT:  [long irrelevant answer about sub and gsub, with a reference
> to "R Cookbook" by P. Teetor.]
> >
> > Me:  Teetor's book is not authoritative, and the other reference doesn't
> discuss ambiguous alternation.
> >
> > ... finally we get to the interesting part ...
> >
> > ChatGPT: I apologize for the mistake in my previous answer.
> >
> > A more authoritative reference for the behavior of regular expressions
> in R can be found in the official R documentation.
> >
> > Regarding the behavior of ambiguous alternation, the regex section of
> the manual "An Introduction to R" explains it as follows:
> >
> > "When several alternatives are possible, the first alternative is
> chosen, so it is important to specify alternatives in the order in which
> they should be tried."
> >
> > Here is the reference for the official R documentation:
> >
> >"An Introduction to R" manual:
> https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Regular-expression-facilities
> >
> > I hope this reference provides the information you were looking for.
> > ---
> >
> >
> > All right, an authoritative answer!  The only problem is that *it
> doesn't exist*.  There's no such anchor in that document.  R-intro isn't a
> very rapidly changing document, so I could look at every change recorded in
> svn, and that section has *never* existed.
> >
> > I guess ChatGPT is a pattern matching machine, and if such a question
> had an answer, it would probably be in a location like that.  (I would have
> guessed a help page like ?regex instead of R-intro, but I had just told it
> that answer was wrong.)
> >
> > Duncan Murdoch
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> Boris Steipe MD, PhD
>
> Professor em.
> Department of Biochemistry
> Temerty Faculty of Medicine
> University of Toronto
>
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] name for new BioC package

2023-02-04 Thread Kevin Coombes
For the record, as a user, I  *hated* the move from MOFA to MOFA2. Not the
new package name, but the fact that they also Schanged all the function
names and argument names. Mostly, they switched from using periods to
underscores. But this meant having to tediously hand-edit every script that
used MOFA in order to continue using that script in newer versions of R
(since they also discontinued supporting the MOFA package in newer
versions). Also, some of the changes produced less useful graphical
summaries, to the extent that I took the time to write my own code to
reproduce the original versions.

So, I would suggest that you at least think about how much work you are
creating for your established users before making the change. And make
choices that minimize the burden you are imposing on them.

Best,
  Kevin

On Sat, Feb 4, 2023, 1:03 AM Hervé Pagès  wrote:

> Hi Matteo.
>
> We had DESeq2 after DESeq, Rbowtie2 after Rbowtie, MOFA2 after MOFA,
> etc.. so I don't see any problem, but thanks for asking!
>
> Best,
>
> H.
>
> On 03/02/2023 00:08, Matteo Tiberti wrote:
> > dear maintainers,
> >
> > I am currently listed as maintainer of Bioconductor package MoonlightR,
> designed for the prediction of cancer driver genes, which implements the
> Moonlight workflow.
> >
> > We are currently working on a second version of our workflow, called
> Moonlight2, and would like to have it released on Bioconductor as well, in
> form of the Moonlight2R package. The new package uses similar principles as
> the current one, but will have significant changes and updates, both in
> terms of new functionality and revision of old functionalities. The
> Moonlight2R project/paper will also have in part a different corresponding
> authorship respect to the current one. MoonlightR and Moonlight2R currently
> reside in two separate GitHub repositories.
> >
> > Ideally we would like to have both packages on BioConductor for the
> moment, the old one (called MoonlightR) and the new one that we intend to
> submit before the April cut-off for 3.17 (called Moonlight2R), where the
> number signifies the version of the protocol rather than the software.
> However on the package submission list, I see that having package names
> that "imply a temporal relationship" respect to an existing package is
> discouraged. Given the circumstances, do you think it would be possible to
> use the Moonlight2R name for the package (i.e. would it be a reason for
> rejection or object of revision during submission) or is it fair to keep it
> as is?
> >
> > Many thanks
> >
> > Matteo Tiberti
> >
> > Danish Cancer Society Research Center
> > Strandboulevarden 49
> > DK-2100 Copenhagen
> > Telephone: +45 35 25 73 07
> >
> >
> > [https://i.xink.io/Images/Get/K116/d1.png]<
> https://www.cancer.dk/?utm_source=email_medium=medarbejderemail_campaign=medarbejderemail_content=cancerdk
> >
> >
> > www.cancer.dk | Vores
> privatlivspolitik
> >
> >
> >   [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> --
> Hervé Pagès
>
> Bioconductor Core Team
> hpages.on.git...@gmail.com
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [R-pkg-devel] Mysterious "invalid z limit"

2023-01-08 Thread Kevin Coombes
I have been using R-Forge for many years for package development. And I
have been using GitLab for other projects almost as long.

However, over the past few months, the R-Forge support seems to be
decaying, and I now have several projects that won't currently build there
because of various items that appear to need fixing at their end. So, I am
actively exploring what it will take to move packages and projects to git.

I already know how to use a git client to clone a Subversion repository
from R-Forge (using "git svn"). And how to change the remote origin to push
it to a new git location. (And I  may also be willing to lose the revision
history if it is going to make the transition easier.)

I am now at the step of understanding the recent changes at GitLab with
respect to support for "Educational" or "Open Source" status, especially in
terms of how many monthly minutes of CI/CD time I can use for free. When
working on a new package, I tend to make lots of small commit-pushes, and
it sounds like each one of those will eat up minutes. So, any advice on how
to manage that issue would be greatly appreciated.

Best,
  Kevin

On Sun, Jan 8, 2023, 11:30 AM Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

>   If you use GitHub, I highly recommend using "GitHub Action" as
> described by Wickham and Bryan, R Packages:
>
>
> https://r-pkgs.org/code.html#code-style
>
>
>   I'm not sure the best way to get it set up, but I have all my
> packages on GitHub configured so each "push" that changes anything has
> "R CMD Check" run on 5 different platforms:  The release version of R on
> the latest Windows, macOS, and Ubuntu plus the development version and
> the most recent old release on Ubuntu.  I rarely run R CMD check on my
> local machine anymore:  I just "git commit" and "git push".  Then GitHub
> Action manages testing on those 5 platforms.
>
>
>   To be precise, I do "git status" before "git push" to make sure
> I
> have "committed" everything I want to commit before I "git push".  And I
> do "git pull" to make sure a collaborator hasn't "pushed" something new
> I should look at before I "git push".
>
>
>   Finally, I want to thank again Gábor Csárdi who helped me
> greatly get
> past problems I hand with "GitHub Action" for my "sos" package.  He
> provided example workflows in:
>
>
>
> https://github.com/r-lib/actions/blob/v2-branch/examples/check-standard.yaml
>
>
>   I also needed LaTeX support, for which Gábor suggested the
> following:
>
>
> https://github.com/r-lib/actions/tree/v2/setup-tinytex#ctan-packages
>
>
>   Spencer Graves
>
>
> On 1/8/23 9:11 AM, Kevin R. Coombes wrote:
> > A very helpful answer. For some reason (probably because I have an
> > ancient perl script that automates the steps i take when building and
> > checking packages), I keep forgetting that the "tools" package let's me
> > do these things from within R.
> >
> > I had already isolated the offending line ("plot(obj)") inside the chunk
> > where the error occurred, and removed any additional arguments. I
> > wrapped that line in a "try" command followed by a conditional
> > "traceback()" to find the problem.  This allowed the package build to
> > knit the vignette and provide some feedback about what was going on. It
> > turned out that I had copied and pasted an assignment line of the form
> >
> > main <- [compute the title]
> >
> > from earlier in the code and pasted it directly as an argument to the
> > call to image.default. And R did exactly what I told it to (not
> > surprisingly), and interpreted the value of that assignment as the
> > unnamed "zlim" option that would have been the corresponding positional
> > argument that should have been there.
> >
> > And yes, I still use "left arrow" <- instead of equals = as assignments.
> > (Heck, I even use emacs and ESS with a leftover keybinding that uses the
> > underscore key to insert the left arrow. Apparently, I'm ancient myself.)
> >
> >Kevin
> >
> > On 1/8/2023 5:04 AM, Duncan Murdoch wrote:
> >> On 07/01/2023 8:43 p.m., Kevin R. Coombes wrote:
> >>> Hi,
> >>>
> >>> I am in the middle of developing a new package, which contains a
> >>> markdown-knitr-html vignette. When I try to run
> >>>
> >>> R CMD build [mypackagedirectory]
> >>>
> >>> I get an error message
> >>>
> >>> Quitting from lines 330-336
> >>> Error: processing vignette  failed with diagnostics:
> >>> invalid z limits
> >>>
> >>> If I run the same markdown script interactively inside R Studio, there
> >>> is no error.
> >>> If I knit the markdown script inside R Studio, it produces the correct
> >>> HTML output, with no error.
> >>>
> >>> The offending lines of code (the chunk at 330-336) invoke an "image"
> >>> method on an object of a class defined in the package, which in turn
> >>> computes a matrix from items inside the object and calls image.default,
> >>> which is presumably where the error is coming from.
> >>>
> >>> Two questions: (1) How 

Re: [Rd] Not documenting a function and not getting a check error?

2023-01-06 Thread Kevin Coombes
I am fairly certain that the check for documentation is really just a check
for the presence of the function name in an "alias" line. My circumstantial
evidence, from a package in the early stages of development, came from
changing the name of a function. I updated everything else (usage,
examples, etc.) but forgot to change the alias. Got a warning that the
newly named function was not documented. It took me a while to figure out
why R CMD check was still complaining.

I am also pretty sure that, when looking for help in at least one existing
package (can't remember which one), I clicked on the link and got sent to a
page that said absolutely nothing about the function I was interested in.

On Fri, Jan 6, 2023, 4:48 AM Duncan Murdoch 
wrote:

> On 05/01/2023 10:10 p.m., Deepayan Sarkar wrote:
> > On Fri, Jan 6, 2023 at 1:49 AM Duncan Murdoch 
> wrote:
> >>
> >> I'm in the process of a fairly large overhaul of the exports from the
> >> rgl package, with an aim of simplifying maintenance of the package.
> >> During this work, I came across the reverse dependency geomorph that
> >> calls the rgl.primitive function.
> >>
> >> I had forgotten that rgl.primitive was still exported:  I've been
> >> thinking of it as an internal function for a few years now.  I was
> >> surprised geomorph was able to call it.
> >>
> >> Particularly surprising to me was the fact that it is not properly
> >> documented.  One of the help topics lists it as an alias, but it
> >> contains no usage info, and is not mentioned in the .Rd file other than
> >> the alias.  And yet "R CMD check rgl" has never complained about it.
> >>
> >> Is this intentional?
> >
> > Does the Rd file that documents it have \keyword{internal}? These are
> > not checked fully (as I realized recently while working on the help
> > system), and I guess that's intentional.
>
> No, not marked internal.  Here's a simple example:  a package that
> exports f and g, and has only one help page:
>
> -
> NAMESPACE:
> -
> export(f, g)
> -
>
> -
> R/source.R:
> -
> f <- function() "this is f"
>
> g <- function() "this is g"
> -
>
> -
> man/f.Rd:
> -
> \name{f}
> \alias{f}
> \alias{g}
> \title{
> This is f.
> }
> \description{
> This does nothing
> }
> \usage{
> f()
> }
> -
>
>
> No complaints about the lack of documentation of g.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[R-pkg-devel] NAMESPACE issue and grDevices

2019-08-24 Thread Kevin Coombes

Hi,

I've been building a package and hit a NAMESPACE issue that took a while 
to resolve.


My package implements a (new) generic function, and the method for a 
package in the class calls the "smoothScatter" function from the 
"graphics" package. I could build and install the package successfully, 
and when I used the generic function, it worked just fine.


However, when I ran "R CMD check --as-cran", it threw an error from both 
the examples in man pages and a test script. Specifically, the error was 
"package KernSmooth not available". I eventually worked around this 
problem by changing the DESCRIPTION file to "IMPORT KernSmooth" (even 
though I didn't have to actually import anything in "NAMESPACE").


The underlying issue appears to be that
    graphics::smoothScatter
calls a non-exported function from "grDevices"
 grDevices:::.smoothScatterCalcDensity
which in turn calls an explicitly qualified function from "KernSmooth"
    KernSmooth::bkde2D

To complicate matters
    graphics IMPORTs grDevices
but
    grDevices only SUGGESTs KernSmooth.

Since my package already IMPORTed the graphics package and had
    importFrom("graphics", "smoothScatter")
in the NAMESPACE, I was surprised that I had to track back through the 
code to find this dependency and had to make it explicitly known within 
the interface to my package.


Is there something else my package should do? Or should "grDevices" 
actually IMPORT "KernSmooth"?


Best,
  Kevin

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] testing coercion

2018-12-06 Thread Kevin Coombes
Thanks for the suggestion. But it may not be quite expressive enough; I
think I will instead use

 > ifelse(getRversion() < "3.6.0",
 >   ! canCoerce(wv, "double"),   # fails in older versions
 >  canCoerce(wv, "double"))  # works in newer versions

That way, when I update the package six months from now, I might have a
chance of remembering what was going on.

Best,
   Kevin

On 12/6/2018 11:41 AM, Sebastian Meyer wrote:
> Hi Kevin,
>
> I think using
>
>> canCoerce(wv, "double") || getRversion() < "3.6.0"
> could solve the issue of an inconsistent test result and is descriptive.
>
> Best regards,
>
>   Sebastian
>
>
> Am 06.12.18 um 16:59 schrieb Kevin Coombes:
>>   Hi,
>>
>> A package I recently submitted to CRAN includes an S4 class that defines a
>> method to coerce from that class to a numeric vector, basically through
>>
>>>   setAs("myClass", "numeric", function(from) from@psi)
>> Being perhaps overly compulsive about trying to test everything while
>> documenting behavior, my accompanying test script includes these lines:
>>
>>> try( as.numeric(wv) )# should fail
>>> canCoerce(wv, "numeric")   # returns TRUE
>>> canCoerce(wv, "double") # gets a complaint from CRAN incoming pretest
>> The complaint on the last line arises because
>> + in the current version of R, the answer is FALSE
>> + in the development version of R, the answer is TRUE
>> The change is (probably) documented on the R-devel daily news from 04 Sept
>> 2018, which lists
>>
>> Bug Fix:  as(1L, "double") now does coerce (PR#17457)
>>
>> So, here's my question. Do I remove this test (and thus lose my
>> documentation of how the method behaves) in order to get the incoming
>> precheck to stop whinging? Or do I tell the CRAN maintainers to not worry
>> about the test changing, since it's due to a core bug fix, and promise to
>> change my saved test results when the devel version becomes current? (Even
>> though the fixed results will then show up as wrong in the "oldrel" tests.)
>>
>> Best,
>>Kevin
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-package-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Kevin Coombes
Yes; but I have been running around all day without time to sit down and
try them. The suggestions make sense, and I'm looking forward to
implementing them.

On Thu, May 17, 2018, 3:55 PM Ben Bolker <bbol...@gmail.com> wrote:

> There have been various comments in this thread (by me, and I think
> Duncan Murdoch) about how you can identify the platform you're running
> on (some combination of .Platform and/or R.Version()) and use it to
> write conditional statements so that your tests will only be compared
> with reference values that were generated on the same platform ... did
> those get through?  Did they make sense?
>
> On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
> <kevin.r.coom...@gmail.com> wrote:
> > Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
> > the issue. The matrices I am using are all nonsingular, and the various
> > algorithms have no problem computing the eigenvalues correctly (up to
> > numerical errors that I can bound and thus account for on tests by
> rounding
> > appropriately). But an eigenvalue of multiplicity M has an M-dimensional
> > eigenspace with no preferred basis. So, any M-dimensional  (unitary)
> change
> > of basis is permitted. That's what give rise to the lack of
> reproducibility
> > across architectures. The choice of basis appears to use different
> > heuristics on 32-bit windows than on 64-bit Windows or Linux machines.
> As a
> > result, I can't include the tests I'd like as part of a CRAN submission.
> >
> > On Thu, May 17, 2018, 2:29 PM William Dunlap <wdun...@tibco.com> wrote:
> >
> >> Your explanation needs to be a bit more general in the case of identical
> >> eigenvalues - each distinct eigenvalue has an associated subspace, whose
> >> dimension is the number repeats of that eigenvalue and the eigenvectors
> for
> >> that eigenvalue are an orthonormal basis for that subspace.  (With no
> >> repeated eigenvalues this gives your 'unique up to sign'.)
> >>
> >> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of
> 0
> >>
> >>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
> >>   > x
> >>[,1] [,2] [,3] [,4] [,5]
> >>   [1,]10001
> >>   [2,]01001
> >>   [3,]00101
> >>   [4,]00000
> >>   [5,]11103
> >> the following give valid but different (by more than sign) eigen vectors
> >>
> >> e1 <- structure(list(values = c(4, 1, 0.999, 0,
> >> -2.22044607159862e-16
> >> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
> >> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
> >> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
> >> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
> >> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
> >> "vectors"), class = "eigen")
> >> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
> >> vectors = structure(c(0.288675134594813, 0.288675134594813,
> >> 0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
> >> 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
> >> 0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
> >> -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
> >> 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
> >> ), class = "eigen")
> >>
> >> I.e.,
> >> > all.equal(crossprod(e1$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 1.407255e-15"
> >> > all.equal(crossprod(e2$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 3.856478e-15"
> >> > all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
> >> [1] "Mean relative difference: 1.110223e-15"
> >> > all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
> >> [1] "Mean relative difference: 9.069735e-16"
> >>
> >> > e1$vectors
> >>[,1]   [,2]  [,3] [,4] [,5]
> >> [1,] -0.2886751  0.000  8.164966e-010 -0.5
> >> [2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
> >> [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
> >> [4,]  0.000  0.000  0.00e+00   -1  0.0
> >> [5,] -0.8660254  0.0

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Kevin Coombes
Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
the issue. The matrices I am using are all nonsingular, and the various
algorithms have no problem computing the eigenvalues correctly (up to
numerical errors that I can bound and thus account for on tests by rounding
appropriately). But an eigenvalue of multiplicity M has an M-dimensional
eigenspace with no preferred basis. So, any M-dimensional  (unitary) change
of basis is permitted. That's what give rise to the lack of reproducibility
across architectures. The choice of basis appears to use different
heuristics on 32-bit windows than on 64-bit Windows or Linux machines. As a
result, I can't include the tests I'd like as part of a CRAN submission.

On Thu, May 17, 2018, 2:29 PM William Dunlap <wdun...@tibco.com> wrote:

> Your explanation needs to be a bit more general in the case of identical
> eigenvalues - each distinct eigenvalue has an associated subspace, whose
> dimension is the number repeats of that eigenvalue and the eigenvectors for
> that eigenvalue are an orthonormal basis for that subspace.  (With no
> repeated eigenvalues this gives your 'unique up to sign'.)
>
> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of 0
>
>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
>   > x
>[,1] [,2] [,3] [,4] [,5]
>   [1,]10001
>   [2,]01001
>   [3,]00101
>   [4,]00000
>   [5,]11103
> the following give valid but different (by more than sign) eigen vectors
>
> e1 <- structure(list(values = c(4, 1, 0.999, 0,
> -2.22044607159862e-16
> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
> "vectors"), class = "eigen")
> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
> vectors = structure(c(0.288675134594813, 0.288675134594813,
> 0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
> 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
> 0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
> -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
> 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
> ), class = "eigen")
>
> I.e.,
> > all.equal(crossprod(e1$vectors), diag(5), tol=0)
> [1] "Mean relative difference: 1.407255e-15"
> > all.equal(crossprod(e2$vectors), diag(5), tol=0)
> [1] "Mean relative difference: 3.856478e-15"
> > all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
> [1] "Mean relative difference: 1.110223e-15"
> > all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
> [1] "Mean relative difference: 9.069735e-16"
>
> > e1$vectors
>[,1]   [,2]  [,3] [,4] [,5]
> [1,] -0.2886751  0.000  8.164966e-010 -0.5
> [2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
> [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
> [4,]  0.000  0.000  0.00e+00   -1  0.0
> [5,] -0.8660254  0.000 -6.106227e-160  0.5
> > e2$vectors
>   [,1]  [,2]  [,3] [,4] [,5]
> [1,] 0.2886751 -7.844376e-01  2.265489e-010 -0.5
> [2,] 0.2886751  5.884158e-01  5.660684e-010 -0.5
> [3,] 0.2886751  1.960217e-01 -7.926173e-010 -0.5
> [4,] 0.000  0.00e+00  0.00e+00   -1  0.0
> [5,] 0.8660254  4.464109e-17 -1.112441e-160  0.5
>
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
> maech...@stat.math.ethz.ch> wrote:
>
>> >>>>> Duncan Murdoch 
>> >>>>> on Thu, 17 May 2018 12:13:01 -0400 writes:
>>
>> > On 17/05/2018 11:53 AM, Martin Maechler wrote:
>> >>>>>>> Kevin Coombes ... on Thu, 17
>> >>>>>>> May 2018 11:21:23 -0400 writes:
>>
>> >>[..]
>>
>> >> > [3] Should the documentation (man page) for "eigen" or
>> >> > "mvrnorm" include a warning that the results can change
>> >> > from machine to machine (or between things like 32-bit and
>> >> > 64-bit R on the same machine) because of difference in
>> >> > linear alge

[R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Kevin Coombes
Hi,

I wrote and maintain the Thresher package. It includes code to do
simulations. In the "tests" directory of the package, I do some simple
simulations and run the main algorithm, then write out summaries of the
results

The initial submission of the package to CRAN was delayed because the
"Rout.save" files matched the "Rout" files on 64-bit R but *not*  on 32-bit
R on Windows. After investigating, I realized that when my simulation code
called "MASS::mvrnorm", I got different results from 64-bit and 32-bit
versions of R on the same machine.  Pushing further, I determined that this
was happening because mvrnorm used "eigen" to compute the eigenvalues and
eigenvectors, and "eigen" itself gave different answers in the two R
versions..

The underlying issue (mathematically) is that the correlation/covariance
matrix I was using had repeated eigenvalues, and so there is no unique
choice of basis for the associated eigenspace. This observation suggests
that the issue is potentially more general than 32-bit versus 64-bit; the
results will depend on the implementation of the eigen-decomposition in
whatever linear algebra module is compiled along with R, so it can change
from machine to machine.

I "solved" (well, worked around) the immediate problem with package
submission by changing the test code to not write out anything that might
differ between versions.

With all of that as background, here are my main questions:

[1] Is there any way to put something into the "tests" directory that would
allow me to use these simulations for what computer scientists call
regression testing? (That is, to make sure my changes to the code haven't
changed results in an unexpected way.)

[2] Should there be a flag or instruction to R CMD check that says to only
run or interpret this particular test on a specific version or machine? (Or
is there already such a flag that I don't know about?)

[3] Should the documentation (man page) for "eigen" or "mvrnorm" include a
warning that the results can change from machine to machine (or between
things like 32-bit and 64-bit R on the same machine) because of difference
in linear algebra modules? (Possibly including the statement that
"set.seed" won't save you.)

You can reproduce my example by running this code in different R
versions/machines:

sig <- matrix(0, 16, 16)
sig[1:10, 1:10] <- 0.5
sig[11:16,11:16] <- 0.5
diag(sig) <- 1
eigen(sig)

Best,
  Kevin

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Milestone: 12,000 packages on CRAN

2017-12-15 Thread Kevin Coombes

Cool.

Since I got a package accepted overnight, I'm going to take credit for 
being #12,000


It does look like the apparent exponential growth in packages may have 
finally come to an end however, collapsing back to something nearly 
linear.  Note that under an exponential growth model, CRAN will reach 
15,000 packages at about Thanksgiving 2018. Under a now-linear model, 
that milestone won't occur until some time in March 2019.  Remember, you 
read it here first.


  Kevin

On 12/15/2017 8:29 AM, Henrik Bengtsson wrote:

CRAN reached 12,000 packages [1] today (December 15, 2017).

A huge thank you to the CRAN team!

Milestones:

2017-12-15 12000 pkgs (+6.1/day over 165 days) 6910 mnts (+3.2/day)
2017-07-04 11000 pkgs (+6.3/day over 159 days) 6377 mnts (+3.3/day)
2017-01-27 1 pkgs (+6.3/day over 158 days) 5845 mnts (+3.5/day)
2016-08-22 9000 pkgs (+5.7/day over 175 days) 5289 mnts (+5.8/day)
2016-02-29 8000 pkgs (+5.0/day over 201 days) 4279 mnts (+0.7/day)
2015-08-12 7000 pkgs (+3.4/day over 287 days) 4130 mnts (+2.4/day)
2014-10-29 6000 pkgs (+3.0/day over 335 days) 3444 mnts (+1.6/day)
2013-11-08 5000 pkgs (+2.7/day over 442 days) 2900 mnts (+1.2/day)
2012-08-23 4000 pkgs (+2.1/day over 469 days) 2350 mnts
2011-05-12 3000 pkgs (+1.7/day over 585 days)
2009-10-04 2000 pkgs (+1.1/day over 906 days)
2007-04-12 1000 pkgs
2004-10-01 500 pkgs
2003-04-01 250 pkgs
2002-09-17 68 pkgs
1997-04-23 12 pkgs

These data are for CRAN only [1-14]. There are many more packages
elsewhere, e.g. Bioconductor, GitHub, R-Forge etc.

[1] https://cran.r-project.org/web/packages/
[2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones
[3] https://www.r-pkg.org/
[4] Legacy data collected privately
[5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
[9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html
[10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html
[11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html
[12] https://stat.ethz.ch/pipermail/r-devel/2016-February/072388.html
[13] https://stat.ethz.ch/pipermail/r-devel/2016-August/073011.html
[14] Local CRAN mirror data (https://cran.r-project.org/mirror-howto.html)

All the best,

Henrik
(just one of many)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fwd: Re: RSiteSearch, sos, rdocumentation.org, ...?

2016-09-08 Thread Kevin Coombes
Would it make sense to recreate the "searchable R help pages" by feeding 
them all into elasticsearch, which will automatically index them and 
also provides an extensive (HTTP+JSON-based) API to perform complex 
searches?


On 9/8/2016 10:31 AM, Jonathan Baron wrote:

On 09/08/16 07:09, John Merrill wrote:
Given Google's commitment to R, I don't think that they'd be at all 
averse
to supporting a custom search box on the package page. It might well 
be a
good thing for "someone" to examine the API for setting up such a 
page and

to investigate how to mark the main CRAN page as searchable.


The main CRAN page is not ideal. We need to be able to search the help
files. My site has only the html help files for each package (except
the ones I use, which are fully installed), so someone should
re-create that. The CRAN page has a "Reference manual" in pdf for
every package, but the individual functions are not separated.

But, yes, Google would work, even for my page. And the sos package
would have to be modified for that. As I said, I'm not going to do
this. But I would welcome it.

Jon



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Bioc-devel] Advice (was Re: CRAN package with Bioconductor dependencies)

2015-03-04 Thread Kevin Coombes

HI,

I'm following this discussion with interest, for the following reason.  
There are more than a dozen packages that I have written and still 
maintain.  Most of them were started while I was at M.D. Anderson ,  
They were served from a highly non-mainstream repository hosted there, 
with the code managed in a local Subversion repository. behind their 
firewall   Since moving to Ohio State, I transferred the code to 
R-Forge.  (If you want to figure out what the packages are and do, 
search for OOMPA.)  So, it's still in a non-mainstream repository, 
but it's (to continue the metaphor) at least on a bigger tributary than 
it used to be.


Many of the packages are written to be compatible with some of the core 
BioConductor classes, which means that they import Biobase.


But all of the functionality is available without using BioConductor 
(provided the user is willing to assemble the data into the correct set 
of matrices).


I've been thinking about submitting it to either CRAN or BioConductor.  
Which makes more sense?


Best,
  Kevin

On 3/4/2015 4:27 PM, Laurent Gatto wrote:

On  3 March 2015 06:07, Henrik Bengtsson wrote:


Not that long ago DESCRIPTION field 'Additional_repositories' was
introduced which the purpose of providing references to non-mainstream
package repositories, e.g. R-Forge.  Interestingly, by mainstream
they mean CRAN and Bioconductor.  The 'Additional_repositories' field
is also enforced for CRAN depending on non-mainstream packages, where
depending on can be any package under Depends, Imports,
Suggests and (I guess), LinkingTo and Enhances.

Thanks, Henrik!

If I understand well, Bioconductor is considered a mainstream repository
and so is not expected to be added as an Additional_repository (despite
the fact that install.packages does not install the Bioc repository by
default). The issues with doing so nevertheless would be that CRAN
maintainer might complaind and this would break the tied R/Bioc
versions.

Best wishes,

Laurent


I bet that in a, hopefully, not too far future, we'll find that
install.packages() will install from not only CRAN by default, but
also Bioconductor and whatever Additional_repositories suggests.  As
usual, the bet is about food and drinks in person whenever/whenever
feasible.


BTW, I have a few feature requests related to Bioc releases/versions:

1. Add release date to online announcement pages online, e.g.
http://bioconductor.org/news/bioc_2_14_release/


2. A data.frame listing Bioc versions and their release dates (maybe
even time stamps), e.g.


biocVersions()

1.0 2002-04-29
...
2.14 2014-10-14
3.0 2014-04-14
3.1 2015-04-17


3. As far as I understand it, the recommended Bioc version to use
depends on R version and the date (in the past only R version).  I
would like to have a function that returns the Bioc version as a
function of R version and date.  Maybe BiocInstaller::biocVersion()
could be extended with this feature, e.g.

biocVersion - function(date, rversion) {
   ## Current?
   if (missing(date)  missing(rversion)) return(BIOC_VERSION)

   if (missing(date) date - Sys.date()
   date - as.Date(date)
   if (missing(rversion)) rversion - getRversion()

   ## Lookup by (rversion, date) from known releases
   ## and make best guesses for the future (with a warning)
   ...
}

If such a function could be available as a light-weight script online,
then the proper Bioc repos could be downloaded by
tools:::.BioC_version_associated_with_R_version(), cf. Martin's reply
on lagging Bioc versions.  This would bring us one step closer to
installing Bioc packages using install.packages(), cf. Laurent's
original post. Because it may not be clear to an R user that they need
to go to Bioconductor because a CRAN package depends on a Bioc
package. That user might not even have heard of Bioconductor. Not
suggesting biocLite() should be replaced, but the gap for using
install.packages() could be made smaller.  ... and maybe one day we'll
have an omnibus package installer/updater available in a fresh R
installation.

The above biocVersion() function would also be useful for figuring out
what R/BioC version was in use at a certain year in the past (e.g.
reproducing old work) and for finding out versions of Bioc
release/devel packages back in time (e.g. if you try to be backward
compatible).

Thxs,

Henrik

On Mon, Mar 2, 2015 at 3:41 PM, Laurent Gatto lg...@cam.ac.uk wrote:

Thank you all for your answers.

Laurent

On  2 March 2015 23:27, Martin Morgan wrote:


On 03/02/2015 03:18 PM, Laurent Gatto wrote:

Dear all,


I had never realised that CRAN packages that depended on Bioc packages
could actually not be installed with install.packages without setting a
repo or using BiocInstaller::bioLite. Here is an example using a fresh R
installation

http://cran.r-project.org/web/packages/MSeasy/index.html
Depends: amap, clValid, cluster, fpc, mzR, xcms

$ docker run --rm -ti rocker/r-base R

R version 3.1.2 (2014-10-31) -- Pumpkin Helmet
Copyright (C) 

Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread Kevin Coombes
Hi,

It depends.

The traditional R approach to these matters is that you (a) first 
perform some sort of an analysis and save the results as an object and 
then (b) show or plot what you got.  It is part (b) that tends to be 
really generic, and (in my opinion) should have really generic names -- 
like show or plot or hist or image.

With PCA in particular, you usually have to perform a bunch of 
computations in order to get the principal components from some part of 
the data.  As I understand it now, these computations are performed 
along the way as part of the various plotPCA functions. The R way to 
do this would be something like
 pca - performPCA(mySpecialObject)  # or as.PCA(mySpecialObject)
 plot(pca) # to get the scatter plot
This apporach has the user-friendly advantage that you can tweak the 
plot (in terms of colors, symbols, ranges, titles, etc) without having 
to recompute the principal components every time. (I often find myself 
re-plotting the same PCA several times, with different colors or symbols 
for different factrors associated with the samples.) In addition, you 
could then also do something like
 screeplot(pca)
to get a plot of the percentages of variance explained.

My own feeling is that if the object doesn't know what to do when you 
tell it to plot itself, then you haven't got the right abstraction.

You may still end up needing generics for each kind of computation you 
want to perform (PCA, RLE, MA, etc), which is why I suggested an 
as.PCA function.  After all, as is already pretty generic.  In the 
long run, l this would herlp BioConductor developers, since they 
wouldn't all have to reimplement the visualization code; they would just 
have to figure out how to convert their own object into a PCA or RLE or 
MA object.

And I know that this plotWhatever approach is used elsewhere in 
BioConductor, and it has always bothered me. It just seemed that a post 
suggesting a new generic function provided a reasonable opportunity to 
point out that there might be a better way.

Best,
   Kevin

PS: My own ClassDicsovery package, which is available from RForge via
**|install.packages(ClassDiscovery, 
repos=http://R-Forge.R-project.org;)|**
includes a SamplePCA class that does something roughly similar to this 
for microarrays.

PPS (off-topic): The worst offender in base R -- because it doesn't use 
this typical approch -- is the heatmap function.  Having tried to 
teach this function in several different classes, I have come to the 
conclusion that it is basically unusable by mortals. And I think the 
problem is that it tries to combine too many steps -- clustering rows, 
clustering columns, scaling, visualization -- all in a single fiunction

On 10/20/2014 3:47 PM, davide risso wrote:
 Hi Kevin,

 I don't agree. In the case of EDASeq (as I suppose it is the case for 
 DESeq/DESeq2) plotting the principal components of the count matrix is 
 only one of possible exploratory plots (RLE plots, MA plots, etc.).
 So, in my opinion, it makes more sense from an object oriented point 
 of view to have multiple plotting methods for a single RNA-seq 
 experiment object.

 In addition, this is the same strategy adopted elsewhere in 
 Bioconductor, e.g., for the plotMA method.

 Just my two cents.

 Best,
 davide

 On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes 
 kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote:

 I understand that breaking code is a problem, and that is
 admittedly the main reason not to immediately adopt my suggestion.

 But as a purely logical exercise, creating a PCA object X or
 something similar and using either
 plot(X)
 or
 plot(as.PCA(mySpecialObject))
 is a much more sensible use of object-oriented programming/design.
 This requires no new generics (to write or to learn).

 And you could use it to transition away from the current system by
 convincing the various package maintainers to re-implement plotPCA
 as follows:

 plotPCA - function(object, ...) {
   plot(as.PCA(object), ...)
 }

 This would be relatively easy to eventually deprecate and teach
 users to switch to the alternative.


 On 10/20/2014 1:07 PM, Michael Love wrote:
 hi Kevin,

 that would imply there is only one way to plot an object of a
 given class. Additionally, it would break a lot of code.​

 best,

 Mike

 On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
 kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote:

 But shouldn't they all really just be named plot for the
 appropriate objects?  In which case, there would already be a
 perfectly good generic

 On Oct 20, 2014 10:27 AM, Michael Love
 michaelisaiahl...@gmail.com
 mailto:michaelisaiahl...@gmail.com wrote:

 I noticed that 'plotPCA' functions are defined in EDASeq,
 DESeq2, DESeq,
 affycoretools, Rcade

Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread Kevin Coombes
Hi,

I don't see how it needs more functions (as long as you can get 
developers to agree).  Suppose that someone can define a reusable PCA 
class.  This will contain a single plot generic function, defined once 
and reused by other classes. The existing plotPCA interface can also 
be implemented just once, in this class, as

 plotPCA - function(object, ...) plot(as.PCA(object), ...)

This can be exposed to users of your class through namespaces.  Then the 
only thing a developer needs to implement in his own class is the single 
as.PCA function.  And he/she would have already been rquired to 
implement this as part of the old plotPCA function.  So it can be 
extracted from that, and the developer doesn't have to reimplement the 
visualization code from the PCA class.

Best,
   Kevin

On 10/20/2014 5:15 PM, davide risso wrote:
 Hi Kevin,

 I see your points and I agree (especially for the specific case of 
 plotPCA that involves some non trivial computations).

 On the other hand, having a wrapper function that starting from the 
 raw data gives you a pretty picture (with virtually zero effort by 
 the user) using a sensible choice of parameters that are more or less 
 OK for RNA-seq data is useful for practitioners that just want to look 
 for patterns in the data.

 I guess it would be the same to have a PCA method for each of the 
 objects and then using the plot method on those new objects, but that 
 would just create a lot more objects and functions than the current 
 approach (like Mike was saying).

 Your as.pca or performPCA approach would be definitely better if 
 all the different methods would create objects of the *same* PCA 
 class, but since we are talking about different packages, I don't know 
 how easy it would be to coordinate. But perhaps this is the way we 
 should go.

 Best,
 davide



 On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes 
 kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote:

 Hi,

 It depends.

 The traditional R approach to these matters is that you (a)
 first perform some sort of an analysis and save the results as an
 object and then (b) show or plot what you got.  It is part (b)
 that tends to be really generic, and (in my opinion) should have
 really generic names -- like show or plot or hist or image.

 With PCA in particular, you usually have to perform a bunch of
 computations in order to get the principal components from some
 part of the data.  As I understand it now, these computations are
 performed along the way as part of the various plotPCA
 functions.  The R way to do this would be something like
 pca - performPCA(mySpecialObject)# or as.PCA(mySpecialObject)
 plot(pca) # to get the scatter plot
 This apporach has the user-friendly advantage that you can tweak
 the plot (in terms of colors, symbols, ranges, titles, etc)
 without having to recompute the principal components every time.
 (I often find myself re-plotting the same PCA several times, with
 different colors or symbols for different factrors associated with
 the samples.) In addition, you could then also do something like
 screeplot(pca)
 to get a plot of the percentages of variance explained.

 My own feeling is that if the object doesn't know what to do when
 you tell it to plot itself, then you haven't got the right
 abstraction.

 You may still end up needing generics for each kind of computation
 you want to perform (PCA, RLE, MA, etc), which is why I suggested
 an as.PCA function.  After all, as is already pretty generic. 
 In the long run, l this would herlp BioConductor developers, since
 they wouldn't all have to reimplement the visualization code; they
 would just have to figure out how to convert their own object into
 a PCA or RLE or MA object.

 And I know that this plotWhatever approach is used elsewhere in
 BioConductor, and it has always bothered me. It just seemed that a
 post suggesting a new generic function provided a reasonable
 opportunity to point out that there might be a better way.

 Best,
   Kevin

 PS: My own ClassDicsovery package, which is available from
 RForge via
 **|install.packages(ClassDiscovery,
 repos=http://R-Forge.R-project.org;
 http://R-Forge.R-project.org)|**
 includes a SamplePCA class that does something roughly similar
 to this for microarrays.

 PPS (off-topic): The worst offender in base R -- because it
 doesn't use this typical approch -- is the heatmap function. 
 Having tried to teach this function in several different classes,
 I have come to the conclusion that it is basically unusable by
 mortals. And I think the problem is that it tries to combine too
 many steps -- clustering rows, clustering columns, scaling,
 visualization -- all in a single fiunction


 On 10/20/2014 3:47 PM, davide risso wrote

Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread Kevin Coombes
Well. I have two responses to that.

First, I think it would be a lot better/easier for users if (most) 
developers could make use of the same plot function for basic classes 
like PCA.

Second, if you think the basic PCA plotting routine needs enhancements, 
you still have two options.  On the one hand, you could (as you said) 
try to convince the maintainer of PCA to add what you want.  If it's 
generally valuable, then he'd probably do it --- and other classes that 
use it would benefit.  On the other hand, if it really is a special 
enhancement that only makes sense for your class, then you can derive a 
class from the basic PCA class
 setClass(mySpecialPCA, contains=c(PCA), *other stuff here*)
  and implement your own version of the plot generic for this class.  
And you could tweak the as.PCA function so it returns an object of the 
mySpecialPCA class. And the user could still just plot the result 
without hacving to care what's happening behind the scenes.

On 10/20/2014 5:59 PM, Michael Love wrote:
 Ah, I see now. Personally, I don't think Bioconductor developers 
 should have to agree on single plotting functions for basic classes 
 like 'PCA' (because this logic applies equally to the situation of all 
 Bioconductor developers agreeing on single MA-plot, a single 
 variance-mean plot, etc). I think letting developers define their 
 plotPCA makes contributions easier (I don't have to ask the owner of 
 plot.PCA to incorporate something), even though it means we have a 
 growing list of generics.

 Still you have a good point about splitting computation and plotting. 
 In practice, we subset the rows so PCA is not laborious.


 On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes 
 kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote:

 Hi,

 I don't see how it needs more functions (as long as you can get
 developers to agree).  Suppose that someone can define a reusable
 PCA class.  This will contain a single plot generic function,
 defined once and reused by other classes. The existing plotPCA
 interface can also be implemented just once, in this class, as

 plotPCA - function(object, ...) plot(as.PCA(object), ...)

 This can be exposed to users of your class through namespaces. 
 Then the only thing a developer needs to implement in his own
 class is the single as.PCA function.  And he/she would have
 already been rquired to implement this as part of the old
 plotPCA function.  So it can be extracted from that, and the
 developer doesn't have to reimplement the visualization code from
 the PCA class.

 Best,
   Kevin


 On 10/20/2014 5:15 PM, davide risso wrote:
 Hi Kevin,

 I see your points and I agree (especially for the specific case
 of plotPCA that involves some non trivial computations).

 On the other hand, having a wrapper function that starting from
 the raw data gives you a pretty picture (with virtually zero
 effort by the user) using a sensible choice of parameters that
 are more or less OK for RNA-seq data is useful for practitioners
 that just want to look for patterns in the data.

 I guess it would be the same to have a PCA method for each of the
 objects and then using the plot method on those new objects, but
 that would just create a lot more objects and functions than the
 current approach (like Mike was saying).

 Your as.pca or performPCA approach would be definitely better
 if all the different methods would create objects of the *same*
 PCA class, but since we are talking about different packages, I
 don't know how easy it would be to coordinate. But perhaps this
 is the way we should go.

 Best,
 davide



 On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
 kevin.r.coom...@gmail.com mailto:kevin.r.coom...@gmail.com wrote:

 Hi,

 It depends.

 The traditional R approach to these matters is that you (a)
 first perform some sort of an analysis and save the results
 as an object and then (b) show or plot what you got.  It is
 part (b) that tends to be really generic, and (in my opinion)
 should have really generic names -- like show or plot or
 hist or image.

 With PCA in particular, you usually have to perform a bunch
 of computations in order to get the principal components from
 some part of the data.  As I understand it now, these
 computations are performed along the way as part of the
 various plotPCA functions.  The R way to do this would be
 something like
 pca - performPCA(mySpecialObject)  # or
 as.PCA(mySpecialObject)
 plot(pca) # to get the scatter plot
 This apporach has the user-friendly advantage that you can
 tweak the plot (in terms of colors, symbols, ranges, titles,
 etc) without having to recompute the principal components

Re: [Rd] R CMD check for the R code from vignettes

2014-06-02 Thread Kevin Coombes

Doc, it hurts when I do this.
So, don't do that.

If no one in R Core does anything about this issue (in terms of changing 
Sweave or Stangle), then the solution still remains very simple.  
Authors of vignettes should avoid using anything in \Sexpr{} that has a 
side effect. As long as they do that, the code will tangle correctly and 
produce the same result as Sweave.


R CMD check already detects other things which may or may not be 
outright errors but are viewed as bad practice. I think it is bad 
practice to put code with side effects into an Sexpr. So, I don't do 
that. If I did do that accidentally, I really wouldn't mind if R CMD 
check warned me abut it.


  -- Kevin

On 6/2/2014 6:28 PM, Gavin Simpson wrote:

On 2 June 2014 15:59, Duncan Murdoch murdoch.dun...@gmail.com wrote:


On 03/06/2014, 4:12 AM, Gavin Simpson wrote:


On 2 June 2014 11:44, Duncan Murdoch murdoch.dun...@gmail.com
mailto:murdoch.dun...@gmail.com wrote:



 Several of us have told you the real harm:  it means that users

 can't easily extract a script that replicates the computations done
 in the vignette.  That's a useful thing to be able to do.


Isn't the issue here that `tangle()` doesn't, currently, extract a
script that replicates the computations done in the vignette, but
rather does so only partially?


No, I think the issue is that some people don't want to have to guarantee
that the tangled source produces the same results.  R doesn't guarantee it,
it is up to the author to do so.


I think those issues have become conflated on this thread; R CMD check
issues raised the problem that side effects in \Sexpr may lead to tangle()
generating an R script that may not work or do so only incorrectly.

Whatever the ensuing discussion; the above issue is not ideal and as you
mention below it could be solved by not allowing side effects in \Sexpr,
fixing tangle so that \Sexpr is recorded, or some other workaround.



People seem to be arguing across one another throughout this thread.
Yihui has identified an infelicity in the tangle implementation. Turning
off tangling + sourcing in R CMD check may not be a desirable solution,
so if the aim is to extract R code to replicate the computations in the
vignette, tangle() needs to be modified to allow for inclusion
(optional) of \Sexpr chunks.


That's one solution, and the other is to limit \Sexpr code to things with
no side effects, as Sweave was originally designed.


That would be perfectly fine also; clarifying usage etc helps and whilst it
may inconvenience those authors that exploited the ambiguity, there is a
solution now that anyone can write their own vignette drivers.





To move this thread forwards, would contributions that added this
optional feature to tangle() be considered by R Core? If so, perhaps
those affected by the current infelicity might wish to propose patches
to the R sources which implement a solution?


As I said before, I'm more sympathetic to that solution than to dropping
the requirement that tangled code should work.  I think the changes to base
R need only be minimal:  only an extra argument to the driver code for the
tangling.  Users who want to use this feature should write their own (or
use someone else's if they don't' mind an extra dependency) as a
non-Sweave vignette driver, whose implementation is to call Stangle with
the non-default parameter setting.

Duncan Murdoch


I agree, and given that the changes to base R would be minimal and yet
solve the problem for those wanting to allow  tangle side effects in
\Sexpr (or allow them to solve it with a driver) it is disappointing to
note i) the length of this thread (!) and ii) the often irrelevant
arguments that some contributors have offered. (Do note this is not
directed specifically at you Duncan.)

It has not gone without notice of late the increasing regularity with which
threads here descend into irrelevant or antagonistic directions.

G



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check for the R code from vignettes

2014-05-30 Thread Kevin Coombes

Hi,

Unless someone is planning to change Stangle to include inline 
expressions (which I am *not* advocating), I think that relying on 
side-effects within an \Sexpr construction is a bad idea. So, my own 
coding style is to restrict my use of \Sexpr to calls of the form 
\Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less 
believe that having R CMD check use Stangle and report an error is 
probably a good thing.


There is a completely separate questions about the relationship between 
Sweave/Stangle or knit/purl and literate programming that is linked to 
your question about whether to use Stangle on vignettes. The underlying 
model(s) in R have drifted away from Knuth's original conception, for 
some good reasons.


The original goal of literate programming was to be able to explain the 
algorithms and data structures in the code to humans.  For that purpose, 
it was important to have named code chunks that you could move around, 
which would allow you to describe the algorithm starting from a high 
level overview and then drilling down into the details. From this 
perspective, tangle was critical to being able to reconstruct a 
program that would compile and run correctly.


The vast majority of applications of Sweave/Stangle or knit/purl in 
modern R have a completely different goal: to produce some sort of 
document that describes the results of an analysis to a non-programmer 
or non-statistician.  For this goal, weave is much more important than 
tangle, because the most important aspect is the ability to integrate 
the results (figures, tables, etc) of running the code into the document 
that get passed off to the person for whom the analysis was prepared. As 
a result, the number of times in my daily work that I need to explicitly 
invoke Stangle (or purl) explicitly is many orders of magnitude smaller 
than  the number of times that I invoke Sweave (or knitr).


  -- Kevin


On 5/30/2014 1:04 AM, Yihui Xie wrote:

Hi,

Recently I saw a couple of cases in which the package vignettes were
somewhat complicated so that Stangle() (or knitr::purl() or other
tangling functions) can fail to produce the exact R code that is
executed by the weaving function Sweave() (or knitr::knit(), ...). For
example, this is a valid document that can pass the weaving process
but cannot generate a valid R script to be source()d:

\documentclass{article}
\begin{document}
Assign 1 to x: \Sexpr{x - 1}
=
x + 1
@
\end{document}

That is because the inline R code is not written to the R script
during the tangling process. When an R package vignette contains
inline R code expressions that have significant side effects, R CMD
check can fail because the tangled output is not correct. What I
showed here is only a trivial example, and I have seen two packages
that have more complicated scenarios than this. Anyway, the key thing
that I want to discuss here is, since the R code in the vignette has
been executed once during the weaving process, does it make much sense
to execute the code generated from the tangle function? In other
words, if the weaving process has succeeded, is it necessary to
source() the R script again?

The two options here are:

1. Do not check the R code from vignettes;
2. Or fix the tangle function so that it produces exactly what was
executed in the weaving process. If this is done, I'm back to my
previous question: does it make sense to run the code twice?

To push this a little further, personally I do not quite appreciate
literate programming in R as two separate steps, namely weave and
tangle. In particular, I do not see the value of tangle, considering
Sweave() (or knitr::knit()) as the new source(). Therefore
eventually I tend to just drop tangle, but perhaps I missed something
here, and I'd like to hear what other people think about it.

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Web: http://yihui.name

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] The case for freezing CRAN

2014-03-20 Thread Kevin Coombes


On 3/20/2014 9:00 AM, Therneau, Terry M., Ph.D. wrote:



On 03/20/2014 07:48 AM, Michael Weylandt wrote:
On Mar 20, 2014, at 8:19, Therneau, Terry M., Ph.D. 
thern...@mayo.edu wrote:



There is a central assertion to this argument that I don't follow:

At the end of the day most published results obtained with R just 
won't be reproducible.


This is a very strong assertion. What is the evidence for it?


If I've understood Jeroen correctly, his point might be alternatively 
phrased as won't be reproducED (i.e., end user difficulties, not 
software availability).


Michael



That was my point as well.  Of the 30+ Sweave documents that I've 
produced I can't think of one that will change its output with a new 
version of R.  My 0/30 estimate is at odds with the nearly all 
assertion.  Perhaps I only do dull things?


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


The only concrete example that comes to mind from my own Sweave reports 
was actually caused by BioConductor and not CRAN. I had a set of 
analyses that used DNAcopy, and the results changed substantially with a 
new release of the package in which they changed the default values to 
the main function call.   As a result, I've taken to writing out more of 
the defaults that I previously just accepted.  There have been a few 
minor issues similar to this one (with changes to parts of the Mclust 
package ??). So my estimates are somewhat higher than 0/30 but are still 
a long way from almost all.


Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion for sets tools upgrade

2014-02-07 Thread Kevin Coombes
As a mathematician by training (and a former practicing mathematician, 
both of which qualifications I rarely feel compelled to pull out of the 
closet), I have to agree with Michael's challenge to the original 
assertion about the mathematical concept of sets.


Sets are collections of distinct objects (at least in Cantors' original 
naive definition) and do not have a notion of duplicate values.  In 
the modern axiomatic definition, one axiom is that two sets are equal 
if and only if they contain the same members. To expand on Michael's 
example, the union of {1, 2} with {1, 3} is {1, 2, 3}, not {1, 2, 1, 3} 
since there is only one distinct object designated by the value 1.


A computer programming language could choose to use the ordered vector 
(or list) [1, 2, 1, 3] as an internal representation of the union of 
[1,2], and [1,3], but it would then have to work hard to perform every 
other meaningful set operation.  For instance, the cardinality of the 
union still has to equal three (not four, which is the length of the 
list), since there are exactly three distinct objects that are members. 
And, as Michael points out, the set represented by [1,2,3] has to be 
equal to the set represented by [1,2,1,3] since they contain exactly the 
same members.


  Kevin

On 2/6/2014 9:39 PM, R. Michael Weylandt wrote:

On Thu, Feb 6, 2014 at 8:31 PM, Carl Witthoft c...@witthoft.com wrote:

First, let me apologize in advance if this is the wrong place to submit a
suggestion for a change to functions in the base-R package.  It never really
occurred to me that I'd have an idea worthy of such a change.

My idea is to provide an upgrade to all the sets tools (intersect, union,
setdiff, setequal) that allows the user to apply them in a strictly
algebraic style.

The current tools, as well documented, remove duplicate values in the input
vectors.  This can be helpful in stats work, but is inconsistent with the
mathematical concept of sets and set measure.

No comments about back-compatability concerns, etc. but why do you
think this is closer to the mathematical concept of sets? As I
learned them, sets have no repeats (or order) and other languages with
set primitives tend to agree:

python {1,1,2,3} == {1,2,3}
True

I believe C++ calls what you're looking for a multiset (albeit with a
guarantee or orderedness).

Cheers,
Michael

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony

2013-04-21 Thread Kevin Coombes

Here's the short answer:  Whatever you used to do should still work.

I started this thread, not knowing that it was going to get sucked into 
a whirlpool on the fringes of an operating system religious war.  My 
sincerest apologies to everyone who has gotten confused as a consequence.


I only ran into problem because I installed R 3.0 and Rtools 3.0 on a 
new machine, and accidentally put Rtools in a different location from 
where it used to reside on the machines I used for R 2.whatever.  And so 
the scripts I used to build packages no longer worked because the path 
was wrong. If you can avoid doing something silly like that, then your 
old methods for building and maintaining packages should work the same 
way they always did.


On 4/21/2013 8:22 PM, steven mosher wrote:

Well, color me confused as heck. I've upgraded to R 3.0 so that I can bring
my packages up to date, but the instructions surrounding Rtools30 are not a
model of clarity.


On Sun, Apr 21, 2013 at 4:04 PM, Gabor Grothendieck ggrothendi...@gmail.com

wrote:
On Sun, Apr 21, 2013 at 6:17 PM, Henrik Bengtsson h...@biostat.ucsf.edu
wrote:

I (as well) keep a specific Rsetup.bat file for launching Windows
cmd.exe with the proper PATH etc setup for build R packages etc.  It's
only after this thread I gave it a second thought; you can indeed
temporarily set the PATH via ~/.Rprofile or ~/.Renviron, which *are*
processed at the very beginning when calling 'R CMD ...'.

EXAMPLE WITH .Rprofile:

## ~/.Rprofile (e.g. C:/User/foo/.Rprofile):
path - unlist(strsplit(Sys.getenv(PATH), ;));
path - c(C:\\Rtools\\bin, C:\\Rtools\\gcc-4.6.3\\bin, path);
Sys.setenv(PATH=paste(unique(path), collapse=;));

## DISABLED:
x:\ R --no-init-file CMD INSTALL matrixStats_0.6.2.tar.gz
* installing to library 'C:/Users/hb/R/win-library/3.0'
* installing *source* package 'matrixStats' ...
** libs
*** arch - i386
ERROR: compilation failed for package 'matrixStats'
* removing 'C:/Users/hb/R/win-library/3.0/matrixStats'

## ENABLED:
x:\ R CMD INSTALL matrixStats_0.6.2.tar.gz
* installing to library 'C:/Users/hb/R/win-library/3.0'
* installing *source* package 'matrixStats' ...
** libs
*** arch - i386
gcc -m32 -IC:/PROGRA~1/R/R-3.0.0patched/include -DNDEBUG [...]
[...]
* DONE (matrixStats)


EXAMPLE WITH .Renviron:
## ~/.Renviron (e.g. C:/User/foo/.Renviron):
# Backslashes are preserved iff put within quotes
PATH=C:\Rtools\bin;C:\Rtools\gcc-4.6.3\bin;${PATH}

x:\ R --no-environ CMD INSTALL matrixStats_0.6.2.tar.gz
= fails

x:\ R CMD INSTALL matrixStats_0.6.2.tar.gz
= works

As long as R is on the PATH, the above either of the approaches
removes the need to add Rtools to the PATH via a BAT file and it won't
clutter up your PATH.  This begs the question (as somewhat already
proposed), instead of users/developers doing this manually, would it
be possible to have 'R CMD ...' to locate add Rtools to the PATH
internally.  That would certainly lower the barriers for newcomers to
install packages from source that need compilation.  Obviously, this
doesn't make the tools (e.g. make) in Rtools available outside of R,
it does not allow you to build R itself from source, but it does cover
the very common use cases of calling 'R CMD build/INSTALL/check/...'.

/Henrik

PS. Hadley, is this what you meant when you wrote Better solutions
(e.g. Rstudio and devtools) temporarily set the path on when you're
calling R CMD *., or those approaches are only when you call 'R CMD'
from the R prompt?  I believe the latter, but I just want to make sure
I didn't miss something.

That seems like a reasonable approach although the code shown does
entail more setup and ongoing maintenance by the user than R.bat which
does not require that the user edit any files and additionally locates
R itself and has many other features.  Also, because R.bat locates R
itself it can be useful even if you are not doing development.  On the
other hand if you are looking to do development strictly from within R
then devtools is already developed.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony

2013-04-20 Thread Kevin Coombes


On 4/20/2013 12:54 PM, Duncan Murdoch wrote:

On 13-04-20 12:30 PM, Gabor Grothendieck wrote:

On Sat, Apr 20, 2013 at 11:49 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:

On 13-04-20 11:09 AM, Gabor Grothendieck wrote:


On Sat, Apr 20, 2013 at 10:45 AM, Hadley Wickham h.wick...@gmail.com
wrote:


Just curious:  how often do you use the Windows find command?  
We have

put
instructions in place for people to run the install process with a
renamed
Rtools find command (which I think is the only conflict). The 
issue is

that
more users who want to use the command line commands are 
familiar with

the
Unix variant (which came first, by the way) than the Windows 
one, so

renaming the Rtools one would cause trouble for more people.



Its not just find - its also sort. And really R has no business
clobbering built in Windows commands. This is just wrong and really
causes anyone who does any significant amount of Windows batch
programming (or uses batch programs of any complexity) endless
problems.



Which is presumably why Rtools doesn't modify the path by default.

Better solutions (e.g. Rstudio and devtools) temporarily set the path
on when you're calling R CMD *.



I am well aware of the various kludges to address this including my
own batchfiles ( http://batchfiles.googlecode.com ) which handles this
by temporarily changing the path as well; however, the real problem is
that Rtools does not play nice with Windows and that needs to be
addressed directly.



It has been.  You ignored it.

Duncan Murdoch



If some change to address this has been made that would be great but
there is no mention of it on the Rtools page in the change history
section (the only documented change relates to the png/tiff/jpeg
libraries), there was no announcement that I saw and Rtools\bin still
contains find and sort so what specifically is the change?


It's not a change to Rtools, it's a change is to the build system in 
R:  it allows you to rename sort or find in your own copy of Rtools, 
and R will use whatever you specify.  You were informed of this when I 
did it in 2007, and I've mentioned it when the topic comes up here, 
most recently in the message quoted above.  That's a long time ago, so 
I don't remember if you tried it then, but I've never heard a 
complaint from anyone else that it doesn't work.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


How do you do that?  (More explicitly, what steps would I have to take 
to redefine things like find.exe and sort.exe in Rtools so that R would 
know how to find them and use them? I can't figure that out from the 
earlier parts of these messages.)


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony

2013-04-20 Thread Kevin Coombes


On 4/20/2013 1:21 PM, Duncan Murdoch wrote:

On 13-04-20 2:02 PM, Kevin Coombes wrote:

On 4/20/2013 12:54 PM, Duncan Murdoch wrote:

It's not a change to Rtools, it's a change is to the build system in
R:  it allows you to rename sort or find in your own copy of Rtools,
and R will use whatever you specify.  You were informed of this when I
did it in 2007, and I've mentioned it when the topic comes up here,
most recently in the message quoted above.  That's a long time ago, so
I don't remember if you tried it then, but I've never heard a
complaint from anyone else that it doesn't work.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


How do you do that?  (More explicitly, what steps would I have to take
to redefine things like find.exe and sort.exe in Rtools so that R would
know how to find them and use them? I can't figure that out from the
earlier parts of these messages.)



Rename them to whatever you want in the Rtools install, then edit the 
definitions.  I think currently they are in src/gnuwin32/Makefile and 
src/gnuwin32/MkRules (one in each), but I'd suggest you just search 
files named M* for the strings sort and find, in case I've got it 
wrong, or it has changed since the last time I looked.


If you try to build R itself rather than just packages, you may need 
to do more edits, because some of the makefiles for things like the 
jpeg libraries weren't written by us, and may have these commands 
hard-coded.


Duncan Murdoch


To most Windows users, the Rtools install  would seem to refer to 
getting the bundled Rtools30.exe from the CRAN web site, double-clicking 
on it, selecting the options form the GUI windows that appear, and 
clicking install.  There is no option in this procedure to change the 
names of find or sort.


As far as I can tell, the steps you are recommending take place in an 
earlier build step.  This would require the user who wants to do this to 
rebuild Rtools in its entirety, which is more trouble than it is likely 
to be worth. Especially when you can avoid the problem by using your own 
batch script or perl script to reset the path on those relatively rare 
occasions when you need to use Rtools.  Since buiilding Rtools for a 
Windows machine is something than CRAN does on a regular basis, why 
can't they just change the names there (and not bother the UNIX users, 
and probably not even bother the UNIX users who find themselves banished 
to the Windows wilderness).  Just call them unixfind and unixsort 
and everyone will be able to figure it out


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony

2013-04-19 Thread Kevin Coombes
Having finally found some free time, I was going to use it to update a 
bunch of R packages from 2.15 to 3.0.


I am running Windows 7, 64-bit professional.  This is on a brand-new 
laptop using vanilla settings when installing the operating system.


Problem 1: I installed R3.0 to the default location (C:\Program 
FIles\R\R-3.0.0).  The first thing I tried to do was install 
BioConductor.  This failed (permission denied). Thinking that this might 
be a BioConductor problem, I then tried to install a (semirandom) 
package from CRAN.  This also failed.


In both cases, when using the GUI, the error message is almost 
incomprehensible.  You get a pop-up window that *only* says Do you want 
to use a private library instead?  Since this wasn't what I wanted to 
do I said no.  Only after the pop-up closes does the command window 
print the error message telling me that permission was denied for R to 
write to its own library location.


Dumb Fix to Problem 1: So, I uninstalled R and then reinstalled to a 
nonstandard location (C:\R\R-3.0.0).  Now I can successfully install 
packages from CRAN and BioConductor (hooray!). But I run directly into:


Problem 2: Emacs Speaks Statistics (ESS) can no longer find the R 
binary. When R was installed in the default location, ESS worked. When R 
2.15 (or earlier) was installed in the same nonstandard location, I 
could get ESS to find the R binaries by including (setq 
ess-directory-containing-r C:) in my .emacs file, but that no longer 
works.


Dumb Fix to Problem 2:  Hack into ess-site.el and put the complete, 
explicit path to the correct binary into

(setq-default inferior-R-program-name 'FULLPATHHERE)
which will break as soon as I upgrade R (assuming I am foolish enough to 
ever do that again).



Now I am ready to rebuild my R packages.  I have this nice perl script 
that goes through the following procedure:


1. Set the path to include the correct Rtools directory.  (For reasons 
that Gabor Grothendieck has pointed out previously, this is not a 
permanent part of the path since doing so would override some built-in 
Windows commands.)

2. Build a source tarball via
R CMD build $package
3. Build a Windows binary version (as a zip file) via
R CMD INSTALL --build $tarball
4. Check the package via
R CMD check --as-cran $tarball
5. Install the package via
R CMD INSTALL $tarball

Problem 3: Step 3 fails, withe the error message Running 'zip' failed.

Dumb Fix to Problem 3: Install the GnbuWin32 version of zip, and make 
sure that its location is earlier in ter path than the version that 
comes with Rtools.


Problem 4: Step 4 fails when running the test scripts that accompany the 
package.  The error message is the semicryptic
cannot open file 'c:\Users\krc\AppData\Local\Temp\Rtmp' 
Permission denied


Dumb Fix to Problem 4: Write this email message and hope someone with 
even more patience than I have has already found a better way to get all 
this stuff to work.


Tired of spinning my wheels,
Kevin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Does anyone use Sweave (RweaveLatex) option expand=FALSE?

2010-08-19 Thread Kevin Coombes
I use it, frequently. The idea for it goes back to some of Knuth's 
original literate programming ideas for developing weave and tangle when 
he was writing TeX (the program).  I want to be able to document the 
pieces of some complex algorithm without having to see all of the gory 
details.  For instance, I have code that looks like the following.  
(Note that this is typed on the fly rather than copied from actual 
source, so there may be typos.)


mainloop,keep.source=TRUE,expand=FALSE=
for (i in 1:nSamples) {
getInfoAboutThisSample
for (j in 1:nChromosomes) {
getChromosomeDataForCurrentSample
normalizeChromosomeData
findSegments
computeSignificance
writeResults
}
}
@

Each of the chunks is itself a fairly long piece of code defined and 
documented somewhere else.  (Some of them may themselves be written in 
the same form to reduce the final size of a chunk to something a human 
has a chance of understanding. That's the difference between weave and 
tangle in the original implementation.)   By blocking expansion, I can 
focus on the main steps without having them lost in pages and pages of code.


So I vote strongly for retaining expand=FALSE.

Best,
   Kevin

Duncan Murdoch wrote:

On 19/08/2010 4:29 PM, Claudia Beleites wrote:

I never used it.

I got curious, though. What would be a situation that benefits of 
this option?
  


When I put it in, I thought it would be for people who were writing 
about Sweave.


Duncan Murdoch

Maybe a use case could be found by brute force (grep all .Rnw files 
on CRAN for the option?


Claudia




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Does anyone use Sweave (RweaveLatex) option expand=FALSE?

2010-08-19 Thread Kevin Coombes
I picked the example from segmenting chromosomes for a reason.  I have a 
fair chunk of code that deals with not quite exceeding the amount of RAM 
available in the machine sitting on my desktop.  If I use functions, 
then the pass-by-value semantics of R will push me beyond the limits at 
some points.  (This is an empirical statement, not a theoretical one.  I 
was bitten by it several times while trying to analyze a couple of these 
datasets. And, yes, I know I can get around this by buying a bigger and 
better machine; it's on order...)  The real point is that using 
functions can be detrimental to the efficiency of the program, in ways 
that have real world consequences.


I haven't thought about doing the same thing with expressions. 
Expressions don't have quite the same semantics as chunks, and you'd 
have to make sure the evaluation was delayed so that you cold use the 
current values of things that were computed in the meantime and I 
already know how to do this with chunks without having to think so hard.


Using expressons would, however, help with the one difficulty that I 
have with reusing chunks (independent of whether or not I use 
'expand=FALSE').  I usually work inside emacs, using the 
emacs-speaks-statistics (ESS) package. ESS doesn't know how to evaluate 
the chunk call inside another chunk. so if I want to step through 
the code during development, I have to jump around myself to locate the 
source chunks.  With expressions that wouldn't matter.


As I ramble on about this, it occurs to me that the underlying issue is 
that chunks are not first class objects either in the LaTeX world or 
in the R world part of Sweave.  If there were a way to promote them to 
first class objects somehow, then it might make my use of ESS easier 
while simultaneously making it easier for Duncan to figure out how to 
report the correct line numbers.  But I only have an extremely vague 
idea of how one might start to do that...


   Kevin

Matt Shotwell wrote:

On Thu, 2010-08-19 at 17:07 -0400, Kevin Coombes wrote:
  
I use it, frequently. The idea for it goes back to some of Knuth's 
original literate programming ideas for developing weave and tangle when 
he was writing TeX (the program).  I want to be able to document the 
pieces of some complex algorithm without having to see all of the gory 
details.  For instance, I have code that looks like the following.  
(Note that this is typed on the fly rather than copied from actual 
source, so there may be typos.)


mainloop,keep.source=TRUE,expand=FALSE=
for (i in 1:nSamples) {
getInfoAboutThisSample
 for (j in 1:nChromosomes) {
getChromosomeDataForCurrentSample
normalizeChromosomeData
findSegments
computeSignificance
writeResults
 }
}
@

Each of the chunks is itself a fairly long piece of code defined and 
documented somewhere else.  (Some of them may themselves be written in 
the same form to reduce the final size of a chunk to something a human 
has a chance of understanding. That's the difference between weave and 
tangle in the original implementation.)   By blocking expansion, I can 
focus on the main steps without having them lost in pages and pages of code.





Couldn't you achieve the same amount of abstraction using function
calls, rather than embedded code chunks? The reader can then see real
code, rather than non-code, or meta-code, or whatever. Alternatively,
represent the code chunks as R expressions, then evaluate the
expressions at the appropriate points.

-Matt

  

So I vote strongly for retaining expand=FALSE.

Best,
Kevin

Duncan Murdoch wrote:


On 19/08/2010 4:29 PM, Claudia Beleites wrote:
  

I never used it.

I got curious, though. What would be a situation that benefits of 
this option?
  

When I put it in, I thought it would be for people who were writing 
about Sweave.


Duncan Murdoch

  
Maybe a use case could be found by brute force (grep all .Rnw files 
on CRAN for the option?


Claudia




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
  

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] library(): load library from a specified location

2010-04-06 Thread Kevin Coombes
If we're counting votes, then I vote no.  And I'd be willing to help 
stuff the ballot box and even volunteer to count the final tallies in 
order to make sure that the no side wins.


I understand the logical argument in favor of use or require or 
borrow. I am not swayed.


Backwards compatibility matters. A lot. This proposed change breaks an 
unfathomably large amount of existing code.  With zero gain in terms of 
performance or reliability.  It probably does not even help new users 
just learning the language, since they still have to be confused about 
why there are two functions that do almost the same thing in terms of 
loading packages.


Even with a long deprecation time, I don't see the value. Just train 
yourself to interpret

 library(aPackage)
as the syntactic form of the thing in R that has the semantic meaning: 
go to the library and bring back aPackage.


Curmudgeonly,
   Kevin

Martin Maechler wrote:

[ re-diverted to R-devel ]

  

Barry Rowlingson b.rowling...@lancaster.ac.uk
on Tue, 30 Mar 2010 20:15:00 +0100 writes:



 On Tue, Mar 30, 2010 at 7:58 PM, Rolf Turner
 r.tur...@auckland.ac.nz wrote:
 But ***please*** say ``load *package*'', not ``load
 library''.  The *location* (collection of packages) from
 which you wish to load the given package is the
 ``library''.

  Anyone vote for deprecating the library() function and
 renaming it use() or requiring require() instead?

I'm voting pro.   



We (R core) had planned to do this, probably about 5 to eight
years ago, then started discussing about possible features of
the new  use()  function, of making a package into an object
that you'd want to interrogate, ...
and then probably got tired  ;-)

With the many moons passed, I'd now tend to *not* add features,
but really renamed 'library' to 'use' 
and create a  library() with a deprecation message which then

simply calls use()...
and yes, I'd allow a very exceptionally long deprecation period
of two to five years before making library() defunct.

Martin

  I mean, when I go get a book out of our library, I don't
 say I'd like to library Case Studies in Spatial Point
 Process Modelling.  Maybe we should use
 'borrow(package)'? Then it might be clear you were getting
 a package from a library, and that you magically put it
 back at the end of your R session

  Slightly silly mood this evening



 Barry

 __
 r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide
 http://www.R-project.org/posting-guide.html and provide
 commented, minimal, self-contained, reproducible code.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Defining a method in two packages

2010-03-09 Thread Kevin Coombes
Wouldn't it make sense to simply create a ranef package whose only 
role in the universe is to create the generic function that lme4, coxme, 
and anyone else who needs it could just import, without getting tons of 
additional and (depending on the application) irrelevant code?


Best,
   Kevin

Uwe Ligges wrote:



On 08.03.2010 17:16, Terry Therneau wrote:

Brian  Uwe,
   Thanks for responding.  Let me see if I can refine the query and move
towards a solution.


From Uwe:
Of course, after loading lme4, you can still use the ranef from coxme:
coxme::ranef(fit)


Of course, but I'm interested in other users (as well as myself) and
prefer to avoid the 'secret handshake' form of a call.


In your own package, you could simply import the generic from coxme.


I don't understand this.


You could import the generic from the other package and define your 
won methods for it in order to make dispatching work correctly.





From Brian:
My solution would though be NOT to reuse a name that is already
established in another package (nlme has used it for many years).
The design problem is that generic foo() in package B might have
nothing to do with foo() in package A.  When it does, we expect B ...


I disagree completely.  It is precisely because of nlme and lmer
prominence that I want to reprise their methods: my users have a much
better chance of remembering how to do things.  If I followed this logic
to its conclusion one should never define a print() method because it
might conflict with the base definition.
   The consequence is that I am under obligation to NOT make my method
something different than Doug's, if I want to satisfy the goal of user
level consistency.  Several aspects of coxme purposefully mimic lmer,
even in cases (such as print.coxme) where his layout is not precisely
what I would have chosen.


Then please folow my suggestion and import the generic from the 
packages mentioned above in your namespace. Then you could extend it 
by your own methods wihtout having to define another generic of the 
same name and avoid the conflicts.




   I really do not want to require lme4 just to pick up the methods
definition.  It's a huge package, and there is no code in common.  Both
packages work very hard to be efficient via sparse matrix methods, but
the actual details are completely different due to the mathematical
structure of our underlying likelihoods.  Use of both in the same
analysis would be rare, so my issue won't be common.


Well, then things become complicated if not impossible.



The situation can be alleviated by making S3 methods visible.  Thus if
coxme exported coxme:::ranef.coxme and lme4 had a default method



ranef- function (object, ...) UseMethod(ranef)


  I have no objection to exporting my method.  If a joint change to lme4
and coxme is the best solution, I will take the discussion off line with
Doug.  Is this the best way forward?


I think so.

Best wishes,
uwe




Terry






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel