Re: [R-pkg-devel] R CMD check works but with devtools::check() examples fail
I think this is because the check systems set different environmental variables. I had the same problem in February, and found out that R 3.6.0 (then still to come) adds new environmental variable _R_CHECK_LENGTH_1_LOGIC2_. This *is* documented, but the documentation is well hidden in R-internals manual. It says (or said when I looked at this in early February): _R_CHECK_LENGTH_1_LOGIC2_ Optionally check if either argument of the binary operators && and || has length greater than one. The format is the same as for _R_CHECK_LENGTH_1_CONDITION_. Default: unset (nothing is reported) R has for ages wanted that condition if(A && B) should have length 1, but with this variable set, it also wants both A and B have length one (which is not the same thing). You need to find the place where this does not happen and fix that. If you look at the end of the error message, it even says in which case you have a length>1 component in your condition (it is given as length 3 in the diagnostic output). I found this then because win-builder set this environmental variable, and there may be other build systems that do the same. You should fix the cases to avoid trouble. Cheers, Jari Oksanen On 16 May 2019, at 13:08, Gábor Csárdi mailto:csardi.ga...@gmail.com>> wrote: On Thu, May 16, 2019 at 10:56 AM Jack O. Wasey mailto:j...@jackwasey.com>> wrote: Agree with Dirk, and also you are running R CMD check on the current directory, Why do you think so? Don't the lines below the "-- Building" header mean that devtools/rcmdcheck is building the package? G. [...] ── Building ─ rdtLite ── Setting env vars: ● CFLAGS: -Wall -pedantic -fdiagnostics-color=always ● CXXFLAGS : -Wall -pedantic -fdiagnostics-color=always ● CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always checking for file ‘/Users/blerner/git/rdtLite.check/rdtLite.Rcheck/00_pkg_src✔ checking for file ‘/Users/blerner/git/rdtLite.check/rdtLite.Rcheck/00_pkg_src/rdtLite/DESCRIPTION’ ─ preparing ‘rdtLite’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘rdtLite_1.0.3.tar.gz’ ── Checking ─ [...] __ R-package-devel@r-project.org<mailto:R-package-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [Rd] future time stamps warning
Could this be a timezone issue (setting the timezone in local computer and communicating this to CRAN): when I look at the email in my computer I see: On Thu, Sep 20, 2018 at 11:46 AM Leo Lahti wrote: -rwxr-xr-x lei/lei1447 2018-09-20 13:23 eurostat/DESCRIPTION Which seems to claim that eurostats/DESCRIPTION was nearly three hours younger than the email. This clearly was in the future back then. If so, waiting a couple of hours before submission could help, and there should be an optimal solution, too (i.e., CRAN and you communicate the timezone or both use the same like UTC). Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] conflicted: an alternative conflict resolution strategy
If you have to load two packages which both export the same name in their namespaces, namespace does not help in resolving which synonymous function to use. Neither does it help to have a package instead of a script as long as you end up loading two namespaces with name conflicts. The order of importing namespaces can also be difficult to control, because you may end up loading a namespace already when you start your R with a saved workspace. Moving a function to another package may be a transitional issue which disappears when both packages are at their final stages, but if you use the recommend deprecation stage, the same names can live together for a long time. So this package is a good idea, and preferably base R should be able to handle the issue of choosing between exported synonymous functions. This has bitten me several times in package development, and with growing CRAN it is a growing problem. Package authors often have poor control of the issue, as they do not know what packages users use. Now we can only have a FAQ that tells that a certain error message does not come from a function in our package, but from some other package having a synonymous function that was used instead. cheers, Jari Oksanen On 23 Aug 2018, at 23:46 pm, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote: First, some general comments: This sounds like a useful package. I would guess it has very little impact on runtime efficiency except when attaching a new package; have you checked that? I am not so sure about your heuristics. Can they be disabled, so the user is always forced to make the choice? Even when a function is intended to adhere to the superset principle, they don't always get it right, so a really careful user should always do explicit disambiguation. And of course, if users wrote most of their long scripts as packages instead of as long scripts, the ambiguity issue would arise far less often, because namespaces in packages are intended to solve the same problem as your package does. One more comment inline about a typo, possibly in an error message. Duncan Murdoch On 23/08/2018 2:31 PM, Hadley Wickham wrote: Hi all, I’d love to get your feedback on the conflicted package, which provides an alternative strategy for resolving ambiugous function names (i.e. when multiple packages provide identically named functions). conflicted 0.1.0 is already on CRAN, but I’m currently preparing a revision (<https://github.com/r-lib/conflicted>), and looking for feedback. As you are no doubt aware, R’s default approach means that the most recently loaded package “wins” any conflicts. You do get a message about conflicts on load, but I see a lot newer R users experiencing problems caused by function conflicts. I think there are three primary reasons: - People don’t read messages about conflicts. Even if you are conscientious and do read the messages, it’s hard to notice a single new conflict caused by a package upgrade. - The warning and the problem may be quite far apart. If you load all your packages at the top of the script, it may potentially be 100s of lines before you encounter a conflict. - The error messages caused by conflicts are cryptic because you end up calling a function with utterly unexpected arguments. For these reasons, conflicted takes an alternative approach, forcing the user to explicitly disambiguate any conflicts: library(conflicted) library(dplyr) library(MASS) select #> Error: [conflicted] `select` found in 2 packages. #> Either pick the one you want with `::` #> * MASS::select #> * dplyr::select #> Or declare a preference with `conflicted_prefer()` #> * conflict_prefer("select", "MASS") #> * conflict_prefer("select", "dplyr") I don't know if this is a typo in your r-devel message or a typo in the error message, but you say `conflicted_prefer()` in one place and conflict_prefer() in the other. conflicted works by attaching a new “conflicted” environment just after the global environment. This environment contains an active binding for any ambiguous bindings. The conflicted environment also contains bindings for `library()` and `require()` that rebuild the conflicted environemnt suppress default reporting (but are otherwise thin wrapeprs around the base equivalents). conflicted also provides a `conflict_scout()` helper which you can use to see what’s going on: conflict_scout(c("dplyr", "MASS")) #> 1 conflict: #> * `select`: dplyr, MASS conflicted applies a few heuristics to minimise false positives (at the cost of introducing a few false negatives). The overarching goal is to ensure that code behaves identically regardless of the order in which packages are attached. - A number of packages provide a function that appears to conflict with a function in a base package, but the
Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check
I am afraid that these suggestions may not work. There are more choices than Win32 and Win64, including several flavours of BLAS/Lapack which probably are involved if you evaluate eigenvalues, and also differences in hardware, compilers and phase of the moon. If there are several equal eigenvalues, any solution of axes is arbitrary and it can be made stable for testing only by chance. If you have M equal eigenvalues, you should try to find a test that the M-dimensional (sub)space is approximately correct irrespective of random orientation of axes in this subspace. Cheers, Jari Oksanen On 18 May 2018, at 00:06 am, Kevin Coombes <kevin.r.coom...@gmail.com<mailto:kevin.r.coom...@gmail.com>> wrote: Yes; but I have been running around all day without time to sit down and try them. The suggestions make sense, and I'm looking forward to implementing them. On Thu, May 17, 2018, 3:55 PM Ben Bolker <bbol...@gmail.com<mailto:bbol...@gmail.com>> wrote: There have been various comments in this thread (by me, and I think Duncan Murdoch) about how you can identify the platform you're running on (some combination of .Platform and/or R.Version()) and use it to write conditional statements so that your tests will only be compared with reference values that were generated on the same platform ... did those get through? Did they make sense? On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes <kevin.r.coom...@gmail.com<mailto:kevin.r.coom...@gmail.com>> wrote: Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are the issue. The matrices I am using are all nonsingular, and the various algorithms have no problem computing the eigenvalues correctly (up to numerical errors that I can bound and thus account for on tests by rounding appropriately). But an eigenvalue of multiplicity M has an M-dimensional eigenspace with no preferred basis. So, any M-dimensional (unitary) change of basis is permitted. That's what give rise to the lack of reproducibility across architectures. The choice of basis appears to use different heuristics on 32-bit windows than on 64-bit Windows or Linux machines. As a result, I can't include the tests I'd like as part of a CRAN submission. On Thu, May 17, 2018, 2:29 PM William Dunlap <wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote: Your explanation needs to be a bit more general in the case of identical eigenvalues - each distinct eigenvalue has an associated subspace, whose dimension is the number repeats of that eigenvalue and the eigenvectors for that eigenvalue are an orthonormal basis for that subspace. (With no repeated eigenvalues this gives your 'unique up to sign'.) E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of 0 x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) ) x [,1] [,2] [,3] [,4] [,5] [1,]10001 [2,]01001 [3,]00101 [4,]00000 [5,]11103 the following give valid but different (by more than sign) eigen vectors e1 <- structure(list(values = c(4, 1, 0.999, 0, -2.22044607159862e-16 ), vectors = structure(c(-0.288675134594813, -0.288675134594813, -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547, -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863, -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"), class = "eigen") e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16), vectors = structure(c(0.288675134594813, 0.288675134594813, 0.288675134594813, 0, 0.866025403784438, -0.784437556312061, 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17, 0.22654886208902, 0.566068420404321, -0.79261728249334, 0, -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors" ), class = "eigen") I.e., all.equal(crossprod(e1$vectors), diag(5), tol=0) [1] "Mean relative difference: 1.407255e-15" all.equal(crossprod(e2$vectors), diag(5), tol=0) [1] "Mean relative difference: 3.856478e-15" all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0) [1] "Mean relative difference: 1.110223e-15" all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0) [1] "Mean relative difference: 9.069735e-16" e1$vectors [,1] [,2] [,3] [,4] [,5] [1,] -0.2886751 0.000 8.164966e-010 -0.5 [2,] -0.2886751 0.7071068 -4.082483e-010 -0.5 [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5 [4,] 0.000 0.000 0.00e+00 -1 0.0 [5,] -0.8660254 0.000 -6.106227e-160 0.5 e2$vectors [,1] [,2] [,3] [,4] [,5] [1,] 0.2886751 -7.844376e-01 2.265489e-010 -0.5 [2,] 0.2886751
Re: [Rd] importing namespaces from base packages
It seems that they are defined in tools/R/check.R. For instance, line 363-364 says: ## The default set of packages here are as they are because ## .get_S3_generics_as_seen_from_package needs utils,graphics,stats and then on lines 368 (Windows) and 377 (other OS) it has: "R_DEFAULT_PACKAGES=utils,grDevices,graphics,stats" So these pass R CMD check and are an "industrial standard". Changing this will be break half of CRAN packages. Cheers, Jari Oksanen On 13/03/18 13:47, Martin Maechler wrote: Adrian Dușa <dusa.adr...@unibuc.ro> on Tue, 13 Mar 2018 09:17:08 +0200 writes: > On Mon, Mar 12, 2018 at 2:18 PM, Martin Maechler <maech...@stat.math.ethz.ch> > wrote: >> [...] >> Is that so? Not according to my reading of the 'Writing R >> Extensions' manual, nor according to what I have been doing in >> all of my packages for ca. 2 years: >> >> The rule I have in my mind is >> >> 1) NAMESPACE Import(s|From) \ >> <==> DESCRIPTION -> 'Imports:' >> 2) .. using "::" in R code / >> >> >> If you really found that you did not have to import from say >> 'utils', I think this was a *un*lucky coincidence. > Of course, the importFrom() is mandatory in NAMESPACE otherwise the package > does not pass the checks. > The question was related to the relation between the packages mentioned in > the NAMESPACE and the packages mentioned in the Imports: field from > DESCRIPTION. > For instance, the current version 3.1 of package QCA on CRAN mentions in > the DESCRIPTION: > Imports: venn (≥ 1.2), shiny, methods, fastdigest > while the NAMESPACE file has: > import(shiny) > import(venn) > import(fastdigest) > importFrom("utils", "packageDescription", "remove.packages", > "capture.output") > importFrom("stats", "glm", "predict", "quasibinomial", "binom.test", > "cutree", "dist", "hclust", "na.omit", "dbinom", "setNames") > importFrom("grDevices", "dev.cur", "dev.new", "dev.list") > importFrom("graphics", "abline", "axis", "box", "mtext", "par", "title", > "text") > importFrom("methods", "is") > There are functions from packages utils, stats, grDevices and graphics for > which the R checks do not require a specific entry in the Imports: field. > I suspect because all of these packages are part of the base R, but so is > package methods. The question is why is it not mandatory for those packages > to be mentioned in the Imports: field from DESCRIPTION, while removing > package methods from that field runs into an error, despite maintaining the > package in the NAMESPACE's importFrom(). Thank you, Adrian, for clarification of your question. As a matter of fact, I was not aware of what you showed above, and personally I think I do add every package/namespace mentioned in NAMESPACE to the DESCRIPTION's "Imports:" field. AFAIK the above phenomenon is not documented, and rather the docs would imply that this phenomenon might go away -- I for one would vote for more consistency here .. Martin >> [...] >> There are places in the R source where it is treated specially, >> indeed, part of 'methods' may be needed when it is neither >> loaded nor attached (e.g., when R runs with only base, say, and >> suddenly encounters an S4 object), and there still are >> situations where 'methods' needs to be in the search() path and >> not just loaded, but these cases should be unrelated to the >> above DESCRIPTION-Imports vs NAMESPACE-Imports correspondence. > This is what I had expected myself, then the above behavior has to have > another explanation. > It is just a curiosity, there is naturally nothing wrong with maintaining > package methods in the Imports: field. Only odd why some base R packages > are treated differently than other base R packages, at the package checks > stage. > Thank you, > Adrian > -- > Adrian Dusa > University of Bucharest > Romanian Social Data Archive > Soseaua Panduri nr. 90-92 > 050663 Bucharest sector 5 > Romania > https://adriandusa.eu > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why R should never move to git
This is exactly the instruction given in https://xkcd.com/1597/ cheers, J.O. On 25/01/18 14:48, Mario Emmenlauer wrote: Hi Duncan! I think there are many users whose first experiences with git where frustrating, and trust me, many people here can relate to your pain. I can certainly say that I can. At first, git makes significant effort to become fluent in seemingly "simple" tasks. I can literally feel your pain right now. But this is the main downside of git: that it can be hard to learn. I overcame this problem by collecting copy-paste-instructions for the most common tasks. I think Dirk provided a very nice starting point for a typical pull request, and next time you need to use git, maybe try his instructions. They are *exactly* what I use at least once a week. However they are not 1:1 for your current situation, where you already started a fork. If you want to solve your current "mess", I personally find the easiest thing to move all local changes away (to /tmp/ or wherever), trash the github fork, and start over with Dirks instructions. At point (4) you can copy your changed files back from /tmp/ and use them for new commits, in this new, clean branch. Everything else should just work. Cheers, Mario On 25.01.2018 13:09, Duncan Murdoch wrote: On 25/01/2018 6:49 AM, Dirk Eddelbuettel wrote: On 25 January 2018 at 06:20, Duncan Murdoch wrote: | On 25/01/2018 2:57 AM, Iñaki Úcar wrote: | > For what it's worth, this is my workflow: | > | > 1. Get a fork. | > 2. From the master branch, create a new branch called fix-[something]. | > 3. Put together the stuff there, commit, push and open a PR. | > 4. Checkout master and repeat from 2 to submit another patch. | > | > Sometimes, I forget the step of creating the new branch and I put my | > fix on top of the master branch, which complicates things a bit. But | > you can always rename your fork's master and pull it again from | > upstream. | | I saw no way to follow your renaming suggestion. Can you tell me the | steps it would take? Remember, there's already a PR from the master | branch on my fork. (This is for future reference; I already followed | Gabor's more complicated instructions and have solved the immediate | problem.) 1) Via GUI: fork or clone at github so that you have URL to use in 2) Github would not allow me to fork, because I already had a fork of the same repository. I suppose I could have set up a new user and done it. I don't know if cloning the original would have made a difference. I don't have permission to commit to the original, and the manipulateWidget maintainers wouldn't be able to see my private clone, so I don't see how I could create a PR that they could use. Once again, let me repeat: this should be an easy thing to do. So far I'm pretty convinced that it's actually impossible to do it on the Github website without hacks like creating a new user. It's not trivial but not that difficult for a git expert using command line git. If R Core chose to switch the R sources to use git and used Github to host a copy, problems like mine would come up fairly regularly. I don't think R Core would gain enough from the switch to compensate for the burden of dealing with these problems. Maybe Gitlab or some other front end would be better. Duncan Murdoch 2) Run git clone giturl to fetch local instance 3) Run git checkout -b feature/new_thing_a (this is 2. above by Inaki) 4) Edit, save, compile, test, revise, ... leading to 1 or more commits 5) Run git push origin standard configuration should have remote branch follow local branch, I think the "long form" is git push --set-upstream origin feature/new_thing_a 6) Run git checkout - or git checkout master and you are back in master. Now you can restart at my 3) above for branches b, c, d and create independent pull requests I find it really to have a bash prompt that shows the branch: edd@rob:~$ cd git/rcpp edd@rob:~/git/rcpp(master)$ git checkout -b feature/new_branch_to_show Switched to a new branch 'feature/new_branch_to_show' edd@rob:~/git/rcpp(feature/new_branch_to_show)$ git checkout - Switched to branch 'master' Your branch is up-to-date with 'origin/master'. edd@rob:~/git/rcpp(master)$ git branch -d feature/new_branch_to_show Deleted branch feature/new_branch_to_show (was 5b25fe62). edd@rob:~/git/rcpp(master)$ There are few tutorials out there about how to do it, I once got mine from Karthik when we did a Software Carpentry workshop. Happy to detail off-list, it adds less than 10 lines to ~/.bashrc. Dirk | | Duncan Murdoch | | > Iñaki | > | > | > | > 2018-01-25 0:17 GMT+01:00 Duncan Murdoch: | >> Lately I've been doing some work with the manipulateWidget package, which | >> lives on Github at | >>
Re: [Rd] Are r2dtable and C_r2dtable behaving correctly?
It is not about "really arge total number of observations", but: set.seed(4711);tabs <- r2dtable(1e6, c(2, 2), c(2, 2)); A11 <- vapply(tabs, function(x) x[1, 1], numeric(1));table(A11) A11 0 1 2 166483 666853 14 There are three possible matrices, and these come out in proportions 1:4:1, the one with all cells filled with ones being most common. Cheers, Jari O. From: R-develon behalf of Martin Maechler Sent: 25 August 2017 11:30 To: Gustavo Fernandez Bayon Cc: r-devel@r-project.org Subject: Re: [Rd] Are r2dtable and C_r2dtable behaving correctly? > Gustavo Fernandez Bayon > on Thu, 24 Aug 2017 16:42:36 +0200 writes: > Hello, > While doing some enrichment tests using chisq.test() with simulated > p-values, I noticed some strange behaviour. The computed p-value was > extremely small, so I decided to dig a little deeper and debug > chisq.test(). I noticed then that the simulated statistics returned by the > following call > tmp <- .Call(C_chisq_sim, sr, sc, B, E) > were all the same, very small numbers. This, at first, seemed strange to > me. So I decided to do some simulations myself, and started playing around > with the r2dtable() function. Problem is, using my row and column > marginals, r2dtable() always returns the same matrix. Let's provide a > minimal example: > rr <- c(209410, 276167) > cc <- c(25000, 460577) > ms <- r2dtable(3, rr, cc) > I have tested this code in two machines and it always returned the same > list of length three containing the same matrix three times. The repeated > matrix is the following: > [[1]] > [,1] [,2] > [1,] 10782 198628 > [2,] 14218 261949 > [[2]] > [,1] [,2] > [1,] 10782 198628 > [2,] 14218 261949 > [[3]] > [,1] [,2] > [1,] 10782 198628 > [2,] 14218 261949 Yes. You can also do unique(r2dtable(100, rr, cc)) and see that the result is constant. I'm pretty sure this is still due to some integer overflow, in spite of the fact that I had spent quite some time to fix such problem in Dec 2003, see the 14 years old bug PR#5701 https://bugs.r-project.org/bugzilla/show_bug.cgi?id=5701#c2 It has to be said that this is based on an algorithm published in 1981, specifically - from help(r2dtable) - Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. _Applied Statistics_ *30*, 91-97. For those with JSTOR access (typically via your University), available at http://www.jstor.org/stable/2346669 When I start reading it, indeed the algorithm seems start from the expected value of a cell entry and then "explore from there"... and I do wonder if there is not a flaw somewhere in the algorithm: I've now found that a bit more than a year ago, 'paljenczy' found on SO https://stackoverflow.com/questions/37309276/r-r2dtable-contingency-tables-are-too-concentrated that indeed the generated tables seem to be too much around the mean. Basically his example: https://stackoverflow.com/questions/37309276/r-r2dtable-contingency-tables-are-too-concentrated > set.seed(1); system.time(tabs <- r2dtable(1e6, c(100, 100), c(100, 100))); > A11 <- vapply(tabs, function(x) x[1, 1], numeric(1)) user system elapsed 0.218 0.025 0.244 > table(A11) 34 35 36 37 38 39 40 41 42 43 2 17 40129334883 2026 4522 8766 15786 44 45 46 47 48 49 50 51 52 53 26850 42142 59535 78851 96217 107686 112438 108237 95761 78737 54 55 56 57 58 59 60 61 62 63 59732 41474 26939 16006 8827 4633 2050865340116 64 65 66 67 38 13 7 1 > For a 2x2 table, there's really only one degree of freedom, hence the above characterizes the full distribution for that case. I would have expected to see all possible values in 0:100 instead of such a "normal like" distribution with carrier only in [34, 67]. There are newer publications and maybe algorithms. So maybe the algorithm is "flawed by design" for really large total number of observations, rather than wrong Seems interesting ... Martin Maechler __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] registering Fortran routines in R packages
Have you tried using tools:::package_native_routine_registration_skeleton()? If you don't like its output, you can easily edit its results and still avoid most pitfalls. Cheers, Jari Oksanen From: R-devel <r-devel-boun...@r-project.org> on behalf of Berend Hasselman <b...@xs4all.nl> Sent: 10 May 2017 09:48 To: Christophe Dutang Cc: r-devel@r-project.org Subject: Re: [Rd] registering Fortran routines in R packages Christophe, > On 10 May 2017, at 08:08, Christophe Dutang <duta...@gmail.com> wrote: > > Thanks for your email. > > I try to change the name in lowercase but it conflicts with a C > implementation also named halton. So I rename the C function halton2() and > sobol2() while the Fortran function are HALTON() and SOBOL() (I also try > lower case in the Fortran code). Unfortunately, it does not help since I get > > init.c:97:25: error: use of undeclared identifier 'halton_'; did you mean > 'halton2'? > {"halton", (DL_FUNC) _SUB(halton), 7}, > > My current solution is to comment FortEntries array and use > R_useDynamicSymbols(dll, TRUE) for a dynamic search of Fortran routines. Have a look at my package geigen and its init.c. Could it be that you are missing extern declarations for the Fortran routines? Berend __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Some "lm" methods give wrong results when applied to "mlm" objects
I had a look at some influence measures, and it seems to me that currently several methods handle multiple lm (mlm) objects wrongly in R. In some cases there are separate "mlm" methods, but usually "mlm" objects are handled by the same methods as univariate "lm" methods, and in some cases this fails. There are two general patterns of problems in influence measures: 1) The univariate methods assume that overall standard deviation (sd) is of length one, but for "mlm" models we have a multivariate response with a multicolumn residual matrix. The functions also get correctly the sd vector corresponding to the columns, but it is not applied to these, but recycled for rows. This influences rstandard.lm and cooks.distance.lm. For instance, in cooks.distance.lm we have ((res/(sd * (1 - hat)))^2 * hat)/p, where res is a n x m matrix, sd is a m-vector and hat is a n-vector). Both of these functions are very easily fixed. 2) Another problem is that several functions are based on lm.influence function, and it seems that it returns elements sigma and coefficients that are only based on the first variable (first column of the residual matrix wt.res) and give wrong results for other variables. This will influence functions dfbeta.lm (coefficients), dfbetas.lm (coefficients, sigma), dffits (sigma), rstudent.lm (sigma) and covratio (sigma). lm.influence finds these elements in compiled code and this is harder to fix. MASS (the book & the package) avoid using compiled code in their (univariate) studentized residuals, and instead use a clever short-cut. In addition to these, there are a couple of other cases which seem to fail with "mlm" models: confint.lm gives empty result, because the length of the results is defined by names(coef(object)) which is NULL because "mlm" objects return a matrix of coefficients instead of a vector with names. dummy.coef fails because "mlm" objects do not have xlevels item. extractAIC.lm returns only one value instead of a vector, and edf is misleading. Separate deviance.mlm returns a vector of deviances, and logLik.lm returns "'logLik.lm' does not support multiple responses". Probably extractAIC.lm should work like logLik.lm. Several methods already handle "mlm" methods by returning message " is not yet implemented for multivariate lm()" which of course is a natural and correct solution to the problems. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] typo or stale info in qr man
And that missing functionality is that Linpack/Lapack routines do not return rank and have a different style of pivoting? For other aspects, the user-interface is very similar in dqrdc2 in R and in dqrdc in Linpack. Another difference seems to be that the final pivoting reported to the user is different: R keeps the original order except for aliased variables, but Linpack makes either wild shuffling or no pivoting at all. I haven't looked at dqpq3 in Lapack, but it appears to return no rank either (don't know about shuffling the columns). It seems that using Linpack dqrdc directly is not always compatible with dqrdc2 of R although it returns similar objects. That is, when packing up the Linpack function to produce an object with same items as qr.default (qr, rank, qraux, pivot, class "qr"), the result object may not yield similar results in base::qr.fitted, base::qr.resid etc as base::qr.default result (but I haven't had time for thorough testing). This is how I tried to do the packing (apologies for clumsy coding): SEXP do_QR(SEXP x, SEXP dopivot) { /* set up */ int i; int nr = nrows(x), nx = ncols(x); int pivoting = asInteger(dopivot); SEXP qraux = PROTECT(allocVector(REALSXP, nx)); SEXP pivot = PROTECT(allocVector(INTSXP, nx)); /* do pivoting or keep the order of columns? */ if (pivoting) memset(INTEGER(pivot), 0, nx * sizeof(int)); else for(i = 0; i < nx; i++) INTEGER(pivot)[i] = i+1; double *work = (double *) R_alloc(nx, sizeof(double)); int job = 1; x = PROTECT(duplicate(x)); /* QR decomposition with Linpack */ F77_CALL(dqrdc)(REAL(x), , , , REAL(qraux), INTEGER(pivot), work, ); /* pack up */ SEXP qr = PROTECT(allocVector(VECSXP, 4)); SEXP labs = PROTECT(allocVector(STRSXP, 4)); SET_STRING_ELT(labs, 0, mkChar("qr")); SET_STRING_ELT(labs, 1, mkChar("rank")); SET_STRING_ELT(labs, 2, mkChar("qraux")); SET_STRING_ELT(labs, 3, mkChar("pivot")); setAttrib(qr, R_NamesSymbol, labs); SEXP cl = PROTECT(allocVector(STRSXP, 1)); SET_STRING_ELT(cl, 0, mkChar("qr")); classgets(qr, cl); UNPROTECT(2); /* cl, labs */ SET_VECTOR_ELT(qr, 0, x); SET_VECTOR_ELT(qr, 1, ScalarInteger(nx)); /* not really the rank, but no. of columns */ SET_VECTOR_ELT(qr, 2, qraux); SET_VECTOR_ELT(qr, 3, pivot); UNPROTECT(4); /* qr, x, pivot, qraux */ return qr; } cheers, Jari Oksanen From: R-devel <r-devel-boun...@r-project.org> on behalf of Martin Maechler <maech...@stat.math.ethz.ch> Sent: 25 October 2016 11:08 To: Wojciech Musial (Voitek) Cc: R-devel@r-project.org Subject: Re: [Rd] typo or stale info in qr man >>>>> Wojciech Musial (Voitek) <wojciech.mus...@gmail.com> >>>>> on Mon, 24 Oct 2016 15:07:55 -0700 writes: > man for `qr` says that the function uses LINPACK's DQRDC, while it in > fact uses DQRDC2. which is a modification of LINPACK's DQRDC. But you are right, and I have added to the help file (and a tiny bit to the comments in the Fortran source). When this change was done > 20 years ago, it was still hoped that the numerical linear algebra community or more specifically those behind LAPACK would eventually provide this functionality with LAPACK (and we would then use that), but that has never happened according to my knowledge. Thank you for the 'heads up'. Martin Maechler ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] summary( prcomp(*, tol = .) ) -- and 'rank.'
> On 25 Mar 2016, at 11:45 am, peter dalgaard <pda...@gmail.com> wrote: > >> >> On 25 Mar 2016, at 10:08 , Jari Oksanen <jari.oksa...@oulu.fi> wrote: >> >>> >>> On 25 Mar 2016, at 10:41 am, peter dalgaard <pda...@gmail.com> wrote: >>> >>> As I see it, the display showing the first p << n PCs adding up to 100% of >>> the variance is plainly wrong. >>> >>> I suspect it comes about via a mental short-circuit: If we try to control p >>> using a tolerance, then that amounts to saying that the remaining PCs are >>> effectively zero-variance, but that is (usually) not the intention at all. >>> >>> The common case is that the remainder terms have a roughly _constant_, >>> small-ish variance and are interpreted as noise. Of course the magnitude of >>> the noise is important information. >>> >> But then you should use Factor Analysis which has that concept of “noise” >> (unlike PCA). > > Actually, FA has a slightly different concept of noise. PCA can be > interpreted as a purely technical operation, but also as an FA variant with > same variance for all components. > > Specifically, FA is > > Sigma = LL' + Psi > > with Psi a diagonal matrix. If Psi = sigma^2 I , then L can be determined (up > to rotation) as the first p components of PCA. (This is used in ML algorithms > for FA since it allows you to concentrate the likelihood to be a function of > Psi.) > If I remember correctly, we took a correlation matrix and replaced the diagonal elements with variable “communalities” < 1 estimated by some trick, and then chunked that matrix into PCA and called the result FA. A more advanced way was to do this iteratively: take some first axes of PCA/FA, calculate diagonal elements from them & re-feed them into PCA. It was done like that because algorithms & computers were not strong enough for real FA. Now they are, and I think it would be better to treat PCA like PCA, at least in the default output of standard stats::summary function. So summary should show proportion of total variance (for people who think this is a cool thing to know) instead of showing a proportion of an unspecified part of the variance. Cheers, Jari Oksanen (who now switches to listening to today’s Passion instead of continuing with PCA) > Methods like PC regression are not being very specific about the model, but > the underlying line of thought is that PCs with small variances are > "uninformative", so that you can make do with only the first handful > regressors. I tend to interpret "uninformative" as "noise-like" in these > contexts. > > -pd > >> >> Cheers, Jari Oksanen >> >>>> On 25 Mar 2016, at 00:02 , Steve Bronder <sbron...@stevebronder.com> wrote: >>>> >>>> I agree with Kasper, this is a 'big' issue. Does your method of taking only >>>> n PCs reduce the load on memory? >>>> >>>> The new addition to the summary looks like a good idea, but Proportion of >>>> Variance as you describe it may be confusing to new users. Am I correct in >>>> saying Proportion of variance describes the amount of variance with respect >>>> to the number of components the user chooses to show? So if I only choose >>>> one I will explain 100% of the variance? I think showing 'Total Proportion >>>> of Variance' is important if that is the case. >>>> >>>> >>>> Regards, >>>> >>>> Steve Bronder >>>> Website: stevebronder.com >>>> Phone: 412-719-1282 >>>> Email: sbron...@stevebronder.com >>>> >>>> >>>> On Thu, Mar 24, 2016 at 2:58 PM, Kasper Daniel Hansen < >>>> kasperdanielhan...@gmail.com> wrote: >>>> >>>>> Martin, I fully agree. This becomes an issue when you have big matrices. >>>>> >>>>> (Note that there are awesome methods for actually only computing a small >>>>> number of PCs (unlike your code which uses svn which gets all of them); >>>>> these are available in various CRAN packages). >>>>> >>>>> Best, >>>>> Kasper >>>>> >>>>> On Thu, Mar 24, 2016 at 1:09 PM, Martin Maechler < >>>>> maech...@stat.math.ethz.ch >>>>>> wrote: >>>>> >>>>>> Following from the R-help thread of March 22 on "Memory usage in prcomp", >>>>>> >>>>>> I've starte
Re: [Rd] summary( prcomp(*, tol = .) ) -- and 'rank.'
> On 25 Mar 2016, at 10:41 am, peter dalgaard <pda...@gmail.com> wrote: > > As I see it, the display showing the first p << n PCs adding up to 100% of > the variance is plainly wrong. > > I suspect it comes about via a mental short-circuit: If we try to control p > using a tolerance, then that amounts to saying that the remaining PCs are > effectively zero-variance, but that is (usually) not the intention at all. > > The common case is that the remainder terms have a roughly _constant_, > small-ish variance and are interpreted as noise. Of course the magnitude of > the noise is important information. > But then you should use Factor Analysis which has that concept of “noise” (unlike PCA). Cheers, Jari Oksanen >> On 25 Mar 2016, at 00:02 , Steve Bronder <sbron...@stevebronder.com> wrote: >> >> I agree with Kasper, this is a 'big' issue. Does your method of taking only >> n PCs reduce the load on memory? >> >> The new addition to the summary looks like a good idea, but Proportion of >> Variance as you describe it may be confusing to new users. Am I correct in >> saying Proportion of variance describes the amount of variance with respect >> to the number of components the user chooses to show? So if I only choose >> one I will explain 100% of the variance? I think showing 'Total Proportion >> of Variance' is important if that is the case. >> >> >> Regards, >> >> Steve Bronder >> Website: stevebronder.com >> Phone: 412-719-1282 >> Email: sbron...@stevebronder.com >> >> >> On Thu, Mar 24, 2016 at 2:58 PM, Kasper Daniel Hansen < >> kasperdanielhan...@gmail.com> wrote: >> >>> Martin, I fully agree. This becomes an issue when you have big matrices. >>> >>> (Note that there are awesome methods for actually only computing a small >>> number of PCs (unlike your code which uses svn which gets all of them); >>> these are available in various CRAN packages). >>> >>> Best, >>> Kasper >>> >>> On Thu, Mar 24, 2016 at 1:09 PM, Martin Maechler < >>> maech...@stat.math.ethz.ch >>>> wrote: >>> >>>> Following from the R-help thread of March 22 on "Memory usage in prcomp", >>>> >>>> I've started looking into adding an optional 'rank.' argument >>>> to prcomp allowing to more efficiently get only a few PCs >>>> instead of the full p PCs, say when p = 1000 and you know you >>>> only want 5 PCs. >>>> >>>> (https://stat.ethz.ch/pipermail/r-help/2016-March/437228.html >>>> >>>> As it was mentioned, we already have an optional 'tol' argument >>>> which allows *not* to choose all PCs. >>>> >>>> When I do that, >>>> say >>>> >>>>C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root >>>>all.equal(S, crossprod(C)) >>>>set.seed(17) >>>>X <- matrix(rnorm(32000), 1000, 32) >>>>Z <- X %*% C ## ==> cov(Z) ~= C'C = S >>>>all.equal(cov(Z), S, tol = 0.08) >>>>pZ <- prcomp(Z, tol = 0.1) >>>>summary(pZ) # only ~14 PCs (out of 32) >>>> >>>> I get for the last line, the summary.prcomp(.) call : >>>> >>>>> summary(pZ) # only ~14 PCs (out of 32) >>>> Importance of components: >>>> PC1PC2PC3PC4 PC5 PC6 >>>> PC7 PC8 >>>> Standard deviation 3.6415 2.7178 1.8447 1.3943 1.10207 0.90922 >>> 0.76951 >>>> 0.67490 >>>> Proportion of Variance 0.4352 0.2424 0.1117 0.0638 0.03986 0.02713 >>> 0.01943 >>>> 0.01495 >>>> Cumulative Proportion 0.4352 0.6775 0.7892 0.8530 0.89288 0.92001 >>> 0.93944 >>>> 0.95439 >>>> PC9PC10PC11PC12PC13 PC14 >>>> Standard deviation 0.60833 0.51638 0.49048 0.44452 0.40326 0.3904 >>>> Proportion of Variance 0.01214 0.00875 0.00789 0.00648 0.00534 0.0050 >>>> Cumulative Proportion 0.96653 0.97528 0.98318 0.98966 0.99500 1. >>>>> >>>> >>>> which computes the *proportions* as if there were only 14 PCs in >>>> total (but there were 32 originally). >>>> >>>> I would think that the summary should or could in addition show >>>> the usual "proportion of variance explained" like resu
Re: [Rd] Source code of early S versions
> On 29 Feb 2016, at 20:54 pm, Barry Rowlingson <b.rowling...@lancaster.ac.uk> > wrote: > > On Mon, Feb 29, 2016 at 6:17 PM, John Chambers <j...@r-project.org> wrote: >> The Wikipedia statement may be a bit misleading. >> >> S was never open source. Source versions would only have been available >> with a nondisclosure agreement, and relatively few copies would have been >> distributed in source. There was a small but valuable "beta test" network, >> mainly university statistics departments. > > So it was free (or at least distribution cost only), but with a > nondisclosure agreement? Did binaries circulate freely, legally or > otherwise? Okay, guess I'll read the book. > I don’t think I have seen S source, but some other Bell software has license of this type: C THIS INFORMATION IS PROPRIETARY AND IS THE C PROPERTY OF BELL TELEPHONE LABORATORIES, C INCORPORATED. ITS REPRODUCTION OR DISCLOSURE C TO OTHERS, EITHER ORALLY OR IN WRITING, IS C PROHIBITED WITHOUT WRITTEN PRERMISSION OF C BELL LABORATORIES. C IT IS UNDERSTOOD THAT THESE MATERIALS WILL BE USED FOR C EDUCATIONAL AND INSTRUCTIONAL PURPOSES ONLY. (Obviously in FORTRAN) So the code was “open” in the sense that you could see the code, and it had to be “open", because source code was the only way to distribute software before the era of widespread platforms allowing binary distributions (such as VAX/VMS or Intel/MS-DOS). However, the license in effect says that although you can see the code, you are not even allowed to tell anybody that you have seen it. I don’t know how this is interpreted currently, but you may ask the current owner, Nokia. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] building with tcltk on Ubuntu 14.04
On 28/05/2015, at 11:57 AM, Martin Maechler wrote: Ben Bolker bbol...@gmail.com on Tue, 26 May 2015 11:13:41 -0400 writes: False alarm. Completely wiping out my build directory followed by ../R-devel/configure --with-tcl-config=/usr/lib/tclConfig.sh - --with-tk-config=/usr/lib/tkConfig.sh; make seems to work. (My fault for assuming repeated cycles of ./configure; make would actually do the right thing ...) There seems to be a corollary of Clarke's Law (any sufficient advanced technology is indistinguishable from magic) that says that any sufficiently complex software system may *not* be magic, but it's just easier to treat it as though it is ... Thanks for the offer of help ... I also run several computers on Ubuntu 14.04 and never had to anything special, I mean *no* --with-tcl-... or --with-tk- where ever needed for me on 14.04 or earlier Ubuntu's... so I do wonder how you got into problems at all. I also have the same problem with Ubuntu (at least in 14.04, now in 15.04): ./configure does not find tcl/tk without --with-tcl-… and --with-tk-… They are in quite normal places, but still need manual setting. Currently I use something like --with-tcl-config=/usr/lib/tclConfig.sh --with-tk-config=/usr/lib/tkConfig.sh I need these explicit switches only when configure is overwritten. Normal compilation with ./configure works OK and finds Tcl/Tk, but a couple of times per year the configure seems to change so much that I need to use these switches. I have had this problem a couple of years. If I need to guess, I do something wrong and against instructions, and therefore I won't complain. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] vegan moved to GitHub and vegan 2.2-0 is coming (are you ready?)
Dear R-Devels, My apologies for using a wrong list. Please ignore my messages. I would undo this if I only could, but what's done can't be undone (not the the first time in my life when I've learnt this). Cheers, Jari Oksanen On 13/09/2014, at 08:13 AM, Jari Oksanen wrote: Dear vegan team, Vegan development happens now completely in github. R-Forge repository is no more in sync with github. I tried to commit all github changes to R-Forge, but a week ago I got a conflict in file and I haven't had time to resolve that conflict. You can follow vegan development and vegan discussion also without signing to github. The system seems to be completely open and does not require any user credentials (and this is good to remember when we discuss about things). The developer front page is https://github.com/vegandevs We have prepared for the vegan 2.2-0 release. This would be a major release that would close the gap between current development version and CRAN. I haven't set any firm date for the release, but I think R 3.2.0 will be out in October, and we should try to be concurrent with that -- in particular as the 2.0-10 in CRAN will give warnings in R check for that version. We have now solved a couple of major issues. - on technical side: the next R release will spit out tens of warnings for namespace issues (visibility of functions). These were solved in https://github.com/vegandevs/vegan/pull/28 - all vegan functions doing permutation are now based on Gav's permute package. This means that they can use any constrained permutation scheme of the permute package. This also concerns functions that earlier only had simple permutation. Of course, you do not need to use fancy permutation schemes, but the default is still simple permutation and this can be expressed by giving just the number of permutations on the command line. The functions using the new permutation scheme are adonis, anosim, anova.cca for CCA/RDA/dbRDA and hence also for ordistep etc., CCorA, envfit, mantel mantel.partial, mrpp, mso, permutest.betadisper, permutest.cca, protest and simper. The change for functions is now complete, but same clean up and updating of documentation is still to be done. This is discussed in https://github.com/vegandevs/vegan/issues/31 - vegan 2.2-0 will also use parallel processing in several functions. This was already done in several functions in vegan development. The discussion on extending parallel processing to other functions was just opened in https://github.com/vegandevs/vegan/issues/36 . Currently the following functions can use parallel processing: adonis, anosim, anova.cca, mantel, mantel.partial, mrpp and simper can use it permutations, bioenv can asses several competing models in parallel, metaMDS can launch several random starts in parallel and oecosimu can use parallel processing in evaluating the statistics for null communities. If you compare this to the previous list of permutation functions, you see that the following permutation methods do not use parallel procesing: CCorA, envfit, mso, permutest.betadisper and protest. The question is if these also should be parallelized or can we leave them like they are, at least for the next release. - A more controversial issue is that Gav suggested moving rgl-based functions away from vegan to a separate package (https://github.com/vegandevs/vegan/issues/29 ). The main reason was that rgl can cause problems in several platforms and even prevent installing vegan. Indeed, when I tested these functions, they crashed in this Mac laptop. We have now a separate vegan3d package for these functions https://github.com/vegandevs/vegan3d . In addition to ordirgl + friends, rgl.isomap and rgl.renyiaccum it also has oriplot3d package. This package has now the same functionality as these functions had in vegan, and our purpose is to release that concurrently with vegan 2.2-0. I recently suggested to remove these functions from vegan, but we haven't made that change yet so that you can express your opinion on the move. See https://github.com/vegandevs/vegan/pull/37 There are some simpler and smaller things, but you can see those if you follow github. I have now mainly worked with my private fork of vegan and pushed to vegan upstream changes when they have looked more or less finished. At this stage, I have made a pull request, and normally waited for possible comments. To get a second opinion, I have usually waited that Gav has a look at the functions and let him merge them to vegan. Sometimes there has been a long discussion before merge and we have edited the functions before the merge (e.g., https://github.com/vegandevs/vegan/pull/34 ). If changes are small and isolated bug fixes, I have pushed them directly to the vegan upstream, though. I have found this pretty good way of working in github. Cheers, Jari Oksanen
[Rd] vegan moved to GitHub and vegan 2.2-0 is coming (are you ready?)
Dear vegan team, Vegan development happens now completely in github. R-Forge repository is no more in sync with github. I tried to commit all github changes to R-Forge, but a week ago I got a conflict in file and I haven't had time to resolve that conflict. You can follow vegan development and vegan discussion also without signing to github. The system seems to be completely open and does not require any user credentials (and this is good to remember when we discuss about things). The developer front page is https://github.com/vegandevs We have prepared for the vegan 2.2-0 release. This would be a major release that would close the gap between current development version and CRAN. I haven't set any firm date for the release, but I think R 3.2.0 will be out in October, and we should try to be concurrent with that -- in particular as the 2.0-10 in CRAN will give warnings in R check for that version. We have now solved a couple of major issues. - on technical side: the next R release will spit out tens of warnings for namespace issues (visibility of functions). These were solved in https://github.com/vegandevs/vegan/pull/28 - all vegan functions doing permutation are now based on Gav's permute package. This means that they can use any constrained permutation scheme of the permute package. This also concerns functions that earlier only had simple permutation. Of course, you do not need to use fancy permutation schemes, but the default is still simple permutation and this can be expressed by giving just the number of permutations on the command line. The functions using the new permutation scheme are adonis, anosim, anova.cca for CCA/RDA/dbRDA and hence also for ordistep etc., CCorA, envfit, mantel mantel.partial, mrpp, mso, permutest.betadisper, permutest.cca, protest and simper. The change for functions is now complete, but same clean up and updating of documentation is still to be done. This is discussed in https://github.com/vegandevs/vegan/issues/31 - vegan 2.2-0 will also use parallel processing in several functions. This was already done in several functions in vegan development. The discussion on extending parallel processing to other functions was just opened in https://github.com/vegandevs/vegan/issues/36 . Currently the following functions can use parallel processing: adonis, anosim, anova.cca, mantel, mantel.partial, mrpp and simper can use it permutations, bioenv can asses several competing models in parallel, metaMDS can launch several random starts in parallel and oecosimu can use parallel processing in evaluating the statistics for null communities. If you compare this to the previous list of permutation functions, you see that the following permutation methods do not use parallel procesing: CCorA, envfit, mso, permutest.betadisper and protest. The question is if these also should be parallelized or can we leave them like they are, at least for the next release. - A more controversial issue is that Gav suggested moving rgl-based functions away from vegan to a separate package (https://github.com/vegandevs/vegan/issues/29 ). The main reason was that rgl can cause problems in several platforms and even prevent installing vegan. Indeed, when I tested these functions, they crashed in this Mac laptop. We have now a separate vegan3d package for these functions https://github.com/vegandevs/vegan3d . In addition to ordirgl + friends, rgl.isomap and rgl.renyiaccum it also has oriplot3d package. This package has now the same functionality as these functions had in vegan, and our purpose is to release that concurrently with vegan 2.2-0. I recently suggested to remove these functions from vegan, but we haven't made that change yet so that you can express your opinion on the move. See https://github.com/vegandevs/vegan/pull/37 There are some simpler and smaller things, but you can see those if you follow github. I have now mainly worked with my private fork of vegan and pushed to vegan upstream changes when they have looked more or less finished. At this stage, I have made a pull request, and normally waited for possible comments. To get a second opinion, I have usually waited that Gav has a look at the functions and let him merge them to vegan. Sometimes there has been a long discussion before merge and we have edited the functions before the merge (e.g., https://github.com/vegandevs/vegan/pull/34 ). If changes are small and isolated bug fixes, I have pushed them directly to the vegan upstream, though. I have found this pretty good way of working in github. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] historical significance of Pr(Chisq) 2.2e-16
See ?format.pval cheers, jari oksanen From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf of Michael Friendly [frien...@yorku.ca] Sent: 07 May 2014 17:02 To: r-devel Subject: [Rd] historical significance of Pr(Chisq) 2.2e-16 Where does the value 2.2e-16 come from in p-values for chisq tests such as those reported below? Anova(cm.mod2) Analysis of Deviance Table (Type II tests) Response: Freq LR Chisq Df Pr(Chisq) B 11026.2 1 2.2e-16 *** W 7037.5 1 2.2e-16 *** Age 886.6 8 2.2e-16 *** B:W 3025.2 1 2.2e-16 *** B:Age 1130.4 8 2.2e-16 *** W:Age 332.9 8 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] A case for freezing CRAN
Freezing CRAN solves no problem of reproducibility. If you know the sessionInfo() or the version of R, the packages used and their versions, you can reproduce that set up. If you do not know, then you cannot. You can try guess: source code of old release versions of R and old packages are in CRAN archive, and these files have dates. So you can collect a snapshot of R and packages for a given date. This is not an ideal solution, but it is the same level of reproducibility that you get with strictly frozen CRAN. CRAN is no the sole source of packages, and even with strictly frozen CRAN the users may have used packages from other source. I am sure that if CRAN would be frozen (but I assume it happens the same day hell freezes), people would increasingly often use other package sources than CRAN. The choice is easy if the alternatives are to wait for the next year for the bug fix release, or do the analysis now and use package versions in R-Forge or github. Then you could not assume that frozen CRAN packages were used. CRAN policy is not made in this mailing list, and CRAN maintainers are so silent that it hurts ears. However, I hope they won't freeze CRAN. Strict reproduction seems to be harder than I first imagined: ./configure make really failed for R 2.14.1 and older in my office desktop. To reproduce older analysis, I would also need to install older tool sets (I suspect gfortran and cairo libraries). CRAN is one source of R packages, and certainly its policy does not suit all developers. There is no policy that suits all. Frozen CRAN would suit some, but certainly would deter some others. There seems to a common sentiment here that the only reason anybody would use R older than 3.0.3 is to reproduce old results. My experience form the Real Life(™) is that many of us use computers that we do not own, but they are the property of our employer. This may mean that we are not allowed to install there any software or we have to pay, or the Department of project has to pay, to the computer administration for installing new versions of software (our case). This is often called security. Personally I avoid this by using Mac laptop and Linux desktop: these are not supported by the University computer administration and I can do what I please with these, but poor Windows users are stuck. Computer classes are also maintained by centralized computer administration. This January they had new R, but last year it was still two years old. However, users can install packages in their personal folders so that they can use current packages even with older R. Therefore I want to take care that the packages I maintain also run in older R. Therefore I also applaud the current CRAN policy where new versions of packages are backported to previous R release: Even if you are stuck with stale R, you need not be stuck with stale packages. Currently I cannot test with older R than 2.14.2, though, but I do that regularly and certainly before CRAN releases. If somebody wants to prevent this, they can set their package to unnecessarily depend on the current version of R. I would regard this as antisocial, but nobody would ask what I think about this so it does not matter. The development branch of my package is in R-Forge, and only bug fixes and (hopefully) non-breaking enhancements (isolated so that they do not influence other functions, safe so that API does not change or format of the output does not change) are merged to the CRAN release branch. This policy was adopted because it fits the current CRAN policy, and probably would need to change if CRAN policy changes. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] A case for freezing CRAN
On 21/03/2014, at 10:40 AM, Rainer M Krug wrote: This is a long and (mainly) interesting discussion, which is fanning out in many different directions, and I think many are not that relevant to the OP's suggestion. I see the advantages of having such a dynamic CRAN, but also of having a more stable CRAN. I prefer CRAN as it is now, but ion many cases a more stable CRAN might b an advantage. So having releases of CRAN might make sense. But then there is the archiving issue of CRAN. The suggestion was made to move the responsibility away from CRAN and the R infrastructure to the user / researcher to guarantee that the results can be re-run years later. It would be nice to have this build in CRAN, but let's stick at the scenario that the user should care for reproducability. There are two different problems that alternate in the discussion: reproducibility and breakage of CRAN dependencies. Frozen CRAN could make *approximate* reproducibility easier to achieve, but real reproducibility needs stricter solutions. Actual sessionInfo() is minimal information, but re-building a spitting image of old environment may still be demanding (but in many cases this does not matter). Another problem is that CRAN is so volatile that new versions of packages break other packages or old scripts. Here the main problem is how package developers work. Freezing CRAN would not change that: if package maintainers release breaking code, that would be frozen. I think that most packages do not make distinction between development and release branches, and CRAN policy won't change that. I can sympathize with package maintainers having 150 reverse dependencies. My main package only has ~50, and it is sure that I won't test them all with new release. I sometimes tried, but I could not even get all those built because they had other dependencies on packages that failed. Even those that I could test failed to detect problems (in one case all examples were \dontrun and passed nicely tests). I only wish that if people *really* depend on my package, they test it against R-Forge version and alert me before CRAN releases, but that is not very likely (I guess many dependencies are not *really* necessary, but only concern marginal features of the package, but CRAN forces to declare those). Still a few words about reproducibility of scripts: this can be hardly achieved with good coverage, because many scripts are so very ad hoc. When I edit and review manuscripts for journals, I very often get Sweave or knitr scripts that just work, where just means just so and so. Often they do not work at all, because they had some undeclared private functionalities or stray files in the author workspace that did not travel with the Sweave document. I think these -- published scientific papers -- are the main field where the code really should be reproducible, but they often are the hardest to reproduce. Nothing CRAN people do can help with sloppy code scientists write for publications. You know, they are scientists -- not engineers. Cheers, Jari Oksanen Leaving the issue of compilation out, a package which is creating a custom installation of the R version which includes the source of the R version used and the sources of the packages in a on Linux compilable format, given that the relevant dependencies are installed, would be a huge step forward. I know - compilation on Windows (and sometimes Mac) is a serious problem), but to archive *all* binaries and to re-compile all older versions of R and all packages would be an impossible task. Apart from that - doing your analysis in a Virtual Machine and then simply archiving this Virtual Machine, would also be an option, but only for the more tech savy users. In a nutshell: I think a package would be able to provide the solution for a local archiving to make it possible to re-run the simulation with the same tools at a later stage - although guarantees would not be possible. Cheers, Rainer -- Rainer M. Krug email: Raineratkrugsdotde PGP: 0x0F52F982 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] A case for freezing CRAN
On 20/03/2014, at 14:14 PM, S Ellison wrote: If we could all agree on a particular set of cran packages to be used with a certain release of R, then it doesn't matter how the 'snapshotting' gets implemented. This is pretty much the sticking point, though. I see no practical way of reaching that agreement without the kind of decision authority (and effort) that Linux distro maintainers put in to the internal consistency of each distribution. CRAN doesn't try to do that; it's just a place to access packages offered by maintainers. As a package maintainer, I think support for critical version dependencies in the imports or dependency lists is a good idea that individual package maintainers could relatively easily manage, but I think freezing CRAN as a whole or adopting single release cycles for CRAN would be thoroughly impractical. I have a feeling that this discussion has floated between two different arguments in favour of freezing: discontent with package authors who break their packages within R release cycle, and ability to reproduce old results. In the beginning the first argument was more prominent, but now the discussion has drifted to reproducing old results. I cannot see how freezing CRAN would help with package authors who do not separate development and CRAN release branches but introduce broken code, or code that breaks other packages. Freezing a broken snapshot would only mean that the situation cannot be cured before next R release, and then new breakage could be introduced. Result would be dysfunctional CRAN. I think that quite a few of the package updates are bug fixes and minor enhancements. Further, I do think that these should be backported to previous versions of R: users of previous version of R should also benefit from bug fixes. This also is the current CRAN policy and I think this is a good policy. Personally, I try to keep my packages in such a condition that they will also work in previous versions of R so that people do not need to upgrade R to have bug fixes in packages. The policy is the same with Linux maintainers: they do not just build a consistent release, but maintain the release by providing bug fixes. In Linux distributions, end of life equals freezing, or not providing new versions of software. Another issue is reproducing old analyses. This is a valuable thing, and sessionInfo and ability to get certain versions of package certainly are steps forward. It looks that guaranteed reproduction is a hard task, though. For instance, R 2.14.2 is the oldest version of R that I can build out of the box in my Linux desktop. I have earlier built older, even much older, R versions, but something has happened in my OS that crashes the build process. To reproduce an old analysis, I also should install an older version of my OS, then build old R and then get the old versions of packages. It is nice if the last step is made easier. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] cat with backspace and newline characters
On 07/11/2013, at 09:35 AM, Renaud Gaujoux wrote: I agree that the handling of \b is not that strange, once one agrees on what \b actually means, i.e. go back one character and not delete previous character. The fact that R GUI on Mac and Windows interprets/renders it differently shows that normality and strangeness is quite relative though. As a user DEC LA120 terminal I expect the following: cat(a\b^\n) â Everything else feels like a bug. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Compatibility with R 2.15.x: Makefile for (non-Sweave) vignettes in vignettes/?
Henrik, On 14/10/2013, at 00:35 AM, Henrik Bengtsson wrote: In R 3.1.0 (~April 2014), support for vignettes in inst/doc/ will go away (and probably much sooner for CRAN submission), e.g. I've been sticking with inst/doc/ for backward compatible reasons so that I can use a fallback inst/doc/Makefile for building *non*-Sweave vignettes also under R 2.15.x. AFAIK, it is not possible to put a Makefile under vignettes/, i.e. it is not possible to build non-Sweave vignette under vignettes/. You can have Makefile in vignettes, and at the moment this even passes CRAN tests. You may also need to have a vignettes/.install_extras file to move the produced non-vignettes files to their final packaged location. You still get warnings of unused, pointless and misleading files with R 2.15.3, because R 3.0.2 packaging process makes files that R 2.15.3 regards as pointless and misleading. The CRAN policy seems to be to ignore those warnings. R is not backward compatible with herself, and I don't see much that a package author could do to work around this (apart from forking the package). Cheers, Jari O. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] version comparison puzzle
Actually, Bob O'Hara had a blog post about this in August 2012: http://occamstypewriter.org/boboh/2012/08/17/lme4_destined_to_become_stable_through_rounding/ The concluding chapter reads: I have been worried that lme4 will never become stable, but this latest version mollifies me with the thought that the developers can’t go on forever, so eventually lme4 will become stable when the machine precision forces it to be rounded up to 1.0 Cheers, Jari Oksanen From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf of Martyn Plummer [plumm...@iarc.fr] Sent: 03 October 2013 11:15 To: Ben Bolker Cc: r-de...@stat.math.ethz.ch Subject: Re: [Rd] version comparison puzzle It's an underflow problem. When comparing versions, a.b.c is converted first to the integer vector c(a,b,c) and then to the double precision value a + b/base + c/base^2 where base is 1 greater than the largest integer component of any of the versions: i.e 99912 in this case. The last term is then smaller than the machine precision so you can't tell the difference between 1.0.4 and 1.0.5. Martyn On Wed, 2013-10-02 at 23:41 -0400, Ben Bolker wrote: Can anyone explain what I'm missing here? max(pp1 - package_version(c(0.9911.3,1.0.4,1.0.5))) ## [1] ‘1.0.4’ max(pp2 - package_version(c(1.0.3,1.0.4,1.0.5))) ## [1] ‘1.0.5’ I've looked at ?package_version , to no avail. Since max() goes to .Primitive(max) I'm having trouble figuring out where it goes from there: I **think** this is related to ?xtfrm , which goes to .encode_numeric_version, which is doing something I really don't understand (it's in base/R/version.R ...) .encode_numeric_version(pp1) ## [1] 1 1 1 ## attr(,base) ## [1] 9912 ## attr(,lens) ## [1] 3 3 3 ## attr(,.classes) ## [1] package_version numeric_version .encode_numeric_version(pp2) ## [1] 1.08 1.11 1.138889 ## attr(,base) ## [1] 6 ## attr(,lens) ## [1] 3 3 3 ## attr(,.classes) ## [1] package_version numeric_version sessionInfo() R Under development (unstable) (2013-09-09 r63889) Platform: i686-pc-linux-gnu (32-bit) [snip] attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.1.0 tools_3.1.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] question on why Rigroup package moved to Archive on CRAN
What we've got here is failure to communicate. Some men you just can't reach. So you get what we had here last week, which is the way he wants it. Well, he gets it. I don't like it any more than you men. (from Cool hand Luke -- but whose fault?) Cheers, Jari Oksanen On 10/03/2013, at 17:18 PM, Uwe Ligges wrote: I wonder why you do not ask on CRAN@...? List members here cannot know the answer. And we typically do not discuss such matters in public. I wonder why you do not read the e-mail message you get from the CRAN team? Please see the message with subject line Registering .External entry points you got on January 20. You never answered nor fixed the package, hence the package has been archived. Best, Uwe Ligges On 10.03.2013 02:43, Kevin Hendricks wrote: Hi Dan, In case this catches anyone else ... FWIW, I found the issue ... in my Rinit.c, my package uses the .External call which actually takes one SEXP which points to a varargs-like list. Under 2.15.X and earlier, I thought the proper entry for an .External call was as below since it only does take one pointer as an argument: #include Rigroup.h /* Automate using sed or something. */ #if _MSC_VER = 1000 __declspec(dllexport) #endif static const R_ExternalMethodDef R_ExtDef[] = { {igroupFuns, (DL_FUNC)igroupFuns, 1}, {NULL, NULL, 0}, }; void R_init_Rigroup(DllInfo *info) { R_registerRoutines(info,NULL,NULL,NULL,R_ExtDef); } But now according to the latest online docs on building your own package it says: For routines with a variable number of arguments invoked viathe .External interface, one specifies -1 for the number of arguments which tells R not to check the actual number passed. Note that the number of arguments passed to .External are not currently checked but they will be in R 3.0.0. So I need to change my Rinit.c to change the 1 to a -1 and that error should go away. Thanks again for all your help with this. I will update my package and resubmit it once version 3.0 gets released and I get a chance to verify that this does in fact fix the problem. Kevin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Keeping up to date with R-devel
On 27/02/2013, at 18:08 PM, Dirk Eddelbuettel wrote: On 27 February 2013 at 17:16, Renaud wrote: | Hi, | | thanks for the responses. | Dirk I found the script you posted once. Can anyone send me a link to the | beaten to death post? Those were Simon's words, not mine, but I think he referred to the long-ish and painful thread here: http://thread.gmane.org/gmane.comp.lang.r.devel/32779 Feel free to ignore the title, and most assertions by the OP which were never replicated by anybody else. The do not build in src mantra was repeated a few times, and as I recall also refuted once (not by me). That is not a topic I care much about; I use a shortcut, am aware of its (theoretical?) limits but for the casual R CMD check use I get out of R-devel never had an issue. FWIW, I also build in src, at least twice weekly. It is a bit scary to confess this, but I'll duck and cover and I hope they will not catch me. This is a no-no, and if you run in the trouble, you shall not make noise, but you got to clean up your mess all by yourself. I even didn't know about distclean, but I do manual cleaning. When I run in the trouble, the message is usually that there is no rule to build 'x' from 'z'. So I go to the offending directory (folder for Windows users), check which files are not under version control (svn st), remove those, ./configure make. It has worked so far. The day it won't work, I'll remove my old src and start from the square one with a virgin checkout and following the instructions. This has not happened yet, and I have done this for several moths, over a year (I'm afraid that day of destruction is drawing nigh: this abomination must be be stopped). I only do this in my home directory in my office desktop, I don't make install, b! ut I have a symbolic link in ~/bin to the built binary in the build directory so that I can either use the stock R of my system (which still runs in 2.14 series) with stable packages, or experimental R with experimental versions of packages. I think the rule is that you can do anything as long as you don't complain. If you want to complain, you must follow the instructions. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] It's a BUG or a Feature? Generating seq break comparing operators
This should be FAQ 0.0. No other thing is asked as frequently as this. This is the FAQest of all FAQs, and a mother of all FAQs. At least this should be in R posting guide: Read FAQ 7.31 before posting! Cheers, Jari Oksanen On 07/02/2013, at 12:13 PM, R. Michael Weylandt wrote: R FAQ 7.31 Cheers, MW On Thu, Feb 7, 2013 at 10:05 AM, Davide Rambaldi davide.ramba...@ieo.eu wrote: Hello everybody: I get a strange behavior with seq, take a look at this: msd - seq(0.05,0.3, 0.01) msd[13] [1] 0.17 class(msd) [1] numeric class(msd[13]) [1] numeric typeof(msd[13]) [1] double now the problem: msd[13] == 0.17 [1] FALSE It is strange only to me? Consider that: 0.17 == 0.17 [1] TRUE and also a - c(0,1,0.17) a [1] 0.00 1.00 0.17 a[3] == 0.17 [1] TRUE It's a BUG in seq? I suspect something related to doubles … sessionInfo(): R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.15.2 --- PLEASE NOTE MY NEW EMAIL ADDRESS --- - Davide Rambaldi, PhD. - IEO ~ MolMed [e] davide.ramba...@ieo.eu [e] davide.ramba...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check not reading R_LIBS from ~/.R/check.Renviron
Gav, This is off-list since I only wonder what you are trying to do. It seems to me that you're trying something much too elegant and complicated. One thing that I have learnt to avoid is muck up with environments: that is calling trouble. One day they will change R and you will get errors that are very difficult to track (I had that once with gnome: I did not edit the config file manually, but only used configuration GUI tools, but then gnome changed and I could not start X11 after distro upgrade -- and it was difficult to track the reason). What I do with R-devel and R release co-existence is that I keep them completely separate. I do not install (make install) R-devel, but I leave it in its working directory. I have now a symbolic link in ~/bin ($HOME/bin): cd ~/bin ln -s ~/R-devel/bin/R R3 So when I want to run R 3.0.0 I use 'R3' and when I want to use stock R of my distro (no 2.15.1) use R. These are completely different beasts, and packages are installed separately for each so that they are 3.0.0 version in R3 and 2.15 versions in R. I don't edit environments, but always use defaults. With this setup, analogue checks up smoothly with only comment coming from examples: * checking differences from ‘analogue-Ex.Rout’ to ‘analogue-Ex.Rout.save’ ... 2c2 This is vegan 2.1-23 --- This is vegan 2.0-5 1104d1103 Warning: argument 'tol.dw' is not used (yet) 4894d4892 Warning: argument 'tol.dw' is not used (yet) 4933d4930 Warning: argument 'tol.dw' is not used (yet) 6139d6135 Warning: argument 'tol.dw' is not used (yet) 7078d7073 Warning: argument 'tol.dw' is not used (yet) 7116d7110 Warning: argument 'tol.dw' is not used (yet) OK Cheers, Jari From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf of Gavin Simpson [gavin.simp...@ucl.ac.uk] Sent: 16 January 2013 22:13 To: R Devel Mailing List Subject: [Rd] R CMD check not reading R_LIBS from ~/.R/check.Renviron Dear List, Further to my earlier email, I note that, for me at least, R CMD check is *not* reading R_LIBS from ~/.R/check.Renviron on R 2.15.2 patched (r61228) and R Under Development (r61660). The only way I can get R CMD check to look for packages in a user-supplied library is by explicitly exporting R_LIBS set to the relevant directory. R CMD build *does* read R_LIBS from ~/.R/build.Renviron for the same versions of R on the same Fedora 16 laptop. So I am in the strange situation of being able to build but not check a source package having followed the instructions in Writing R Extensions. I have tried exporting R_CHECK_ENVIRON via export R_CHECK_ENVIRON=/home/gavin/.R/check.Renviron and that doesn't work either. ~/.R/check.Renviron contains: R_LIBS=/home/gavin/R/libs/ #R_LIBS=/home/gavin/R/devlibs/ Anyone suggest how/where I am going wrong? More complete system info follows below. TIA Gavin sessionInfo() R version 2.15.2 Patched (2012-12-05 r61228) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=en_GB.utf8LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=CLC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base sessionInfo() R Under development (unstable) (2013-01-16 r61660) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=en_GB.utf8LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=CLC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] prcomp with previously scaled data: predict with 'newdata' wrong
Hello folks, it may be regarded as a user error to scale() your data prior to prcomp() instead of using its 'scale.' argument. However, it is a user thing that may happen and sounds a legitimate thing to do, but in that case predict() with 'newdata' can give wrong results: x - scale(USArrests) sol - prcomp(x) all.equal(predict(sol), predict(sol, newdata=x)) ## [1] Mean relative difference: 0.9033485 Predicting with the same data gives different results than the original PCA of the data. The reason of this behaviour seems to be in these first lines of stats:::prcomp.default(): x - scale(x, center = center, scale = scale.) cen - attr(x, scaled:center) sc - attr(x, scaled:scale) If input data 'x' have 'scaled:scale' attribute, it will be retained if scale() is called with argument scale = FALSE like is the case with default options in prcomp(). So scale(scale(x, scale = TRUE), scale = FALSE) will have the 'scaled:center' of the outer scale() (i.e, numerical zero), but the 'scaled:scale' of the inner scale(). Function princomp finds the 'scale' directly instead of looking at the attributes of the input data, and works like expected: sol - princomp(x) all.equal(predict(sol), predict(sol, newdata=x)) ## [1] TRUE I don't have any nifty solution to this -- only checking the 'scale.' attribute and acting accordingly: sc - if (scale.) attr(x, scaled:scale) else FALSE Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] prcomp with previously scaled data: predict with 'newdata' wrong
To fix myself: the stupid solution I suggested won't work as 'scale.' need not be TRUE or FALSE, but it can be a vector of scales. The following looks like being able to handle this, but is not transparent nor elegant: sc - if (isTRUE(scale.)) attr(x, scaled:scale) else scale. I trust you find an elegant solution (if you think this is worth fixing). Cheers, Jari Oksanen PS. Sorry for the top posting: cannot help with the email system I have in my work desktop. From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf of Jari Oksanen [jari.oksa...@oulu.fi] Sent: 23 May 2012 13:51 To: r-de...@stat.math.ethz.ch Subject: [Rd] prcomp with previously scaled data: predict with 'newdata' wrong Hello folks, it may be regarded as a user error to scale() your data prior to prcomp() instead of using its 'scale.' argument. However, it is a user thing that may happen and sounds a legitimate thing to do, but in that case predict() with 'newdata' can give wrong results: x - scale(USArrests) sol - prcomp(x) all.equal(predict(sol), predict(sol, newdata=x)) ## [1] Mean relative difference: 0.9033485 Predicting with the same data gives different results than the original PCA of the data. The reason of this behaviour seems to be in these first lines of stats:::prcomp.default(): x - scale(x, center = center, scale = scale.) cen - attr(x, scaled:center) sc - attr(x, scaled:scale) If input data 'x' have 'scaled:scale' attribute, it will be retained if scale() is called with argument scale = FALSE like is the case with default options in prcomp(). So scale(scale(x, scale = TRUE), scale = FALSE) will have the 'scaled:center' of the outer scale() (i.e, numerical zero), but the 'scaled:scale' of the inner scale(). Function princomp finds the 'scale' directly instead of looking at the attributes of the input data, and works like expected: sol - princomp(x) all.equal(predict(sol), predict(sol, newdata=x)) ## [1] TRUE I don't have any nifty solution to this -- only checking the 'scale.' attribute and acting accordingly: sc - if (scale.) attr(x, scaled:scale) else FALSE Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] --as-cran and need to ignore.svn directories
On 12/03/2012, at 18:03 PM, Paul Johnson wrote: Good morning: I submitted a package update to CRAN and got a bounce because I had not run R CMD check with --as-cran. I'd not heard of that before, but I'm glad to know about it now. I see it warns when my functions do use partial argument matching, and I like that advice very much. Also I see this warning * checking package subdirectories ... WARNING Found the following directory(s) with names of version control directories: ./.svn ./R/.svn ./data/.svn ./inst/.svn ./inst/doc/.svn ./inst/examples/.svn ./vignettes/.svn These should not be in a package tarball. Is there a way to cause R to ignore the .svn folders while running R CMD check --as-cran or R CMD build? It seems a little tedious to have to copy the whole directory tree to some other place and remove the .svn folders before building. I can do it, but it just seems, well, tedious. I have the feeling that you frequent flyers would have worked around this already. Paul, I think the best solution is to 'svn export' svn directory to a temporary directory/folder: svn export my-svn-directory tmp-pkg-directory R CMD build tmp-pkg-directory R CMD check --as-cran ... The two advantages of 'svn export' that it (1) strips the .svn specific files, and (2) it really exports only those files that really are under version control. More often than once I have had some non-svn files in my svn directory so that *my* version of the package works, but the one actually in subversion fails. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NOTE: unstated dependencies in examples
Hello folks, To get it short, I cut out most of the spurious controversy, and go to the key point (and it also helps to go to sauna and then sleep well all night): On 14/10/11 22:30 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: Also note that the package would be accepted on CRAN as is, if you declared parallel as a Suggests, as far as I understand Jari. At least binaries for Windows for old R versions will be built, since I am checking with _R_CHECK_FORCE_SUGGESTS_=FALSE on Windows. Therefore, I believe (I haven't seen the package) this discussion is meaningless anyway. This is fine and solve the problems I anticipated. I did not know about this possibility. It was not shown in R CMD check --help, nor in the usual manuals I read: it seems to be mentioned in R-ints.texi, but not in R-exts.texi nor in R-admin.texi. Although I feel well at the moment, I return to the old team: about the kind of keyword describing packages that you don't necessarily need, and which are used in style if(require(foo)) {do_something_fancy_with_foo::foo()} They are Sugar: parallel, foo. They are not necessarily needed, if you don't have you don't necessarily even know you need them. Then about old R and new packages: many of us are in situations where we must use an old version of R. However, we can still install packages in private libraries without admin privileges. They may not be system-wide, and they can be wiped out in the next boot, or you may need to have them in your USB stick, but installing a package is rather a light operation which can be be done except in most paranoid systems. One year I burned an R installation to a CD that I distributed to the students so that they could run R in a PC class with too ancient R. In one occasion I gave students temporary usernames to my personal Linux desktop so that they could log in to my desktop from the class for one analysis (but that is in general too clumsy as Windows did not have good X11). New package versions can contain bug fixed and some enhanced functionality in addition to radical new features that require bleeding edge R. Personally, I try to keep my release versions such that they work in current, previous and future major versions of R. Currently I test the package more or less regularly in R 2.13.2 and R-to-be-2.14.0 in MacOS, and in 2.12.2 and R-to-be-2.15.0 in Linux, and I expect the release version to pass all these. The development version can fail in older R, but then we (the team) must judge if we merge such failing features to the release. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NOTE: unstated dependencies in examples
On Thu, 2011-10-13 at 17:34 +0200, Uwe Ligges wrote: I looked at the code and since this is not that trivial to change, I think we can well live with typing grep -r gplots ./man which is not too hard to run on the source package, I believe. Best wishes, Uwe Uwe others, This is OK if you want to identify the cause of the problems. However, the basic problem was that checking required something that is not required: there was one example that was not run, and one case where the loading of the package was not necessary (if(require(package))). I do believe that handling this kind of cases is difficult in automatic checking. However, I think they need not be checked: there should be a new case of package reference in addition to 'depends', 'suggests' and 'enhances' -- something like 'benefitsfrom'. This is now actual to me, since I'm adding 'parallel' support to my package, but there seems to be no clean way of doing this with the current checking procedures. I use the 'parallel' support only if the package is available (in R = 2.14.0, not yet released), and there are multiple cores. If there is only once cpu or there is not yet 'parallel' package, nothing bad will happen: things will only work like they worked earlier without 'parallel' package. I haven't found out how to do this cleanly for R CMD check (it is clean for my code since there the usage is checked). If I add suggests: parallel I get R CMD check error for the current and previous R -- for no reason. So currently I don't mention 'parallel' at all in DESCRIPTION: I get a NOTE and Warnings ('require' call not declared, no visible definitions), but this is a smaller problem than having a spurious failure, and failing to have this package for a system where it works quite normally. The new DESCRIPTION keyword could be used for packages that are useful but not necessary, so that the package can be quite well be used without these packages, but it may have some extra options or functionality with those packages. This sounds like a suggestion to me, but in R language suggestions cannot be refused. Cheers, jari oksanen On 13.10.2011 03:00, Yihui Xie wrote: You have this in Jevons.Rd: # show as balloonplots if (require(gplots)) { and this in Snow.Rd: %\dontrun{ library(sp) It will certainly be helpful if R CMD check can provide more informative messages (in this case, e.g, point out the Rd files). Regards, Yihui -- Yihui Xiexieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Wed, Oct 12, 2011 at 11:33 AM, Michael Friendlyfrien...@yorku.ca wrote: Using R 2.13.1, I am now getting the following NOTE when I run R CMD check on my HistData package * checking for unstated dependencies in examples ... NOTE 'library' or 'require' calls not declared from: gplots sp Under R 2.12.x, I didn't get these notes. I have ~ 25 .Rd files in this package, and AFAICS, every example uses library or require for the functions used; the DESCRIPTION file has the long list of Suggests, which previously was sufficient for packages used in examples. Suggests: gtools, KernSmooth, maps, ggplot2, proto, grid, reshape, plyr, lattice, ReadImages, car But I have no way to find the .Rd file(s) that triggered this note. What is the tool used in R CMD check to make this diagnosis? It would be better if this reported the .Rd file(s) that triggered this note. Is it possible that this note could be specious? -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NOTE: unstated dependencies in examples
On 14/10/11 16:26 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 14/10/2011 9:18 AM, Jari Oksanen wrote: Uwe others, This is OK if you want to identify the cause of the problems. However, the basic problem was that checking required something that is not required: there was one example that was not run, and one case where the loading of the package was not necessary (if(require(package))). I do believe that handling this kind of cases is difficult in automatic checking. However, I think they need not be checked: there should be a new case of package reference in addition to 'depends', 'suggests' and 'enhances' -- something like 'benefitsfrom'. Users use those declarations when they ask to install dependencies. If you don't declare a dependence on a contributed package, users will have to manually install it. Howdy, This is a pretty weak argument in this particular case: 'parallel' is not a contributed package so that you cannot install it. You either have it or you don't have it. In latter case, nothing happens, but everything works like usual. In the former case, you may have some new things. (Having 'parallel' as a contributed package for R 2.14.0 would be a great idea but not something I dare to suggest.) This is now actual to me, since I'm adding 'parallel' support to my package, but there seems to be no clean way of doing this with the current checking procedures. I use the 'parallel' support only if the package is available (in R= 2.14.0, not yet released), and there are multiple cores. Temporarily maintain two releases of your package: one for R 2.14.0 that doesn't mention parallel, and one for R = 2.14.0 that does. The second one should declare its dependence on R = 2.14.0. If support for parallel is your only change, you don't need to do anything for the previous one: CRAN will not replace it in the 2.13.x repository if the new one needs a newer R. Forking my package was indeed one of the three alternatives I have considered. In this case forking sounds really weird: for R 2.13.0 both forks would work identically. The only difference being how they are handled by R checkers. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NOTE: unstated dependencies in examples
On 14/10/11 19:00 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 14.10.2011 16:15, Duncan Murdoch wrote: On 14/10/2011 10:10 AM, Jari Oksanen wrote: On 14/10/11 16:26 PM, Duncan Murdochmurdoch.dun...@gmail.com wrote: On 14/10/2011 9:18 AM, Jari Oksanen wrote: Uwe others, This is OK if you want to identify the cause of the problems. However, the basic problem was that checking required something that is not required: there was one example that was not run, and one case where the loading of the package was not necessary (if(require(package))). I do believe that handling this kind of cases is difficult in automatic checking. However, I think they need not be checked: there should be a new case of package reference in addition to 'depends', 'suggests' and 'enhances' -- something like 'benefitsfrom'. Users use those declarations when they ask to install dependencies. If you don't declare a dependence on a contributed package, users will have to manually install it. Howdy, This is a pretty weak argument in this particular case: 'parallel' is not a contributed package so that you cannot install it. You either have it or you don't have it. In latter case, nothing happens, but everything works like usual. In the former case, you may have some new things. (Having 'parallel' as a contributed package for R 2.14.0 would be a great idea but not something I dare to suggest.) This is now actual to me, since I'm adding 'parallel' support to my package, but there seems to be no clean way of doing this with the current checking procedures. I use the 'parallel' support only if the package is available (in R= 2.14.0, not yet released), and there are multiple cores. Temporarily maintain two releases of your package: one for R 2.14.0 that doesn't mention parallel, and one for R= 2.14.0 that does. The second one should declare its dependence on R= 2.14.0. If support for parallel is your only change, you don't need to do anything for the previous one: CRAN will not replace it in the 2.13.x repository if the new one needs a newer R. Forking my package was indeed one of the three alternatives I have considered. In this case forking sounds really weird: for R 2.13.0 both forks would work identically. The only difference being how they are handled by R checkers. I don't see why it's weird to require that a version that uses a facility that is in 2.14.0 but no earlier versions should have to declare that. Sure, you can put all sorts of conditional tests into your code so that it avoids using the new facility in older versions, but isn't it simpler to just declare the dependency and avoid cluttering your code with those tests? Indeed, I think you should update your package and declare the dependency on R = 2.14.0. This seems to be a cleanest possible approach. Distributing a contributed parallel package without functionality for R 2.14.0 is not, why should anybody develop code for R versions that won't be supported any more in due course? Here one reason: Our PC labs have now R version 2.12.something and it is not in my power to upgrade R, but that depends on the will of our computing centre. If it will upgraded, it will not be 2.14.something. A simple desire to be able to use the package in the environment were I work sounds a valid personal reason. A second point is that the package would not *depend* or anything on R = 2.14.0. It could be faster in some cases, but not in all. It would just as legitimate to have a condition, that the package cannot be used by those poor sods who don't have but one processor (and I was one just a short time ago). Indeed, this is exactly the same condition: you *must* have a hardware I want you to have, and the version of R I want to have. I won't make that requirement. Like I wrote in my previous message, I had considered three choices. One was forking, another was delaying the release of these features till 2.14.* is old, and the third was to depend on 'snow' *and* 'multicore' instead of 'paralle'. Now the second choice sounds the best. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Standardized Pearson residuals
On 15/03/11 13:17 PM, peter dalgaard pda...@gmail.com wrote: On Mar 15, 2011, at 04:40 , Brett Presnell wrote: Background: I'm currently teaching an undergrad/grad-service course from Agresti's Introduction to Categorical Data Analysis (2nd edn) and deviance residuals are not used in the text. For now I'll just provide the students with a simple function to use, but I prefer to use R's native capabilities whenever possible. Incidentally, chisq.test will have a stdres component in 2.13.0 for much the same reason. Thank you. That's one more thing I won't have to provide code for anymore. Coincidentally, Agresti mentioned this to me a week or two ago as something that he felt was missing, so that's at least two people who will be happy to see this added. And of course, I was teaching a course based on Agresti Franklin: Statistics, The Art and Science of Learning from Data, when I realized that R was missing standardized residuals. So nobody uses McCullagh Nelder: Generalized Linear Models in teaching, since they don't realize that R is missing Anscombe residuals, too? Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] postscript failure manifests in plot.TukeyHSD
On 16/12/10 04:24 AM, Paul Murrell p.murr...@auckland.ac.nz wrote: Hi According to the PostScript Language Reference Manual and the PDF Reference, in both PDF and PostScript ... ... a line width of zero is valid, but not recommended (and is clearly not supported by some viewers). ... a line dash pattern cannot be specified as all zero lengths. (So, because R generates the line dash pattern proportional to the line width, a specification of lwd=0 and lty=anything-other-than-solid-or-none does not make sense.) I think three fixes are required: (i) Enforce a minimum line width of 0.01 (mainly because that is not zero, but also because that is the smallest value greater than zero when you round to 2dp like the PDF and PostScript devices do and it's still REALLY thin). (ii) If the line dash pattern ends up as all zeroes (to 2dp), because the line width is so small (thin), force the dash pattern to solid instead. (iii) plot.TukeyHSD() should not use lwd=0 (0.5 is plenty difference to be obviously lighter than the main plot lines) I will commit these unless there are better suggestions or bitter objections. Paul, The difference between working previous (of R 2.11.1) and failing current-still-yesterday (R 2.12.1 RC) was: $ diff -U2 oldtukeyplot.ps /Volumes/TIKKU/tukeyplot.ps --- oldtukeyplot.ps2010-12-14 12:06:07.0 +0200 +++ /Volumes/TIKKU/tukeyplot.ps2010-12-14 12:13:32.0 +0200 @@ -172,5 +172,5 @@ 0 setgray 0.00 setlinewidth -[ 3.00 5.00] 0 setdash +[ 0.00 0.00] 0 setdash np 660.06 91.44 m So 0.00 setlinewidth worked, but [0.00 0.00] 0 setdash failed. Assuming PostScript is anything like English, it is the all-zero dash that caused the failure. Cheers, Jari Oksanen Paul On 15/12/2010 7:20 a.m., Ben Bolker wrote: On 10-12-14 01:16 PM, Peter Ehlers wrote: On 2010-12-14 09:27, Ben Bolker wrote: Jari Oksanenjari.oksanenat oulu.fi writes: Hello R Developers, Dear R-developers, I ran some standard tests with currently (today morning) compiled R release candidate in Linux R 2.12.1 RC (2010-12-13 r53843). Some of these tests used plot.TukeyHSD function. This worked OK on the screen (X11 device), but PostScript file could not be rendered. The following example had the problem with me: postscript(file=tukeyplot.ps) example(plot.TukeyHSD) dev.off() I couldn't view the resulting file with evince in Linux nor in the standard Preview in MacOS. When I compared the generated tukeyplot.ps to the same file generated with an older R in my Mac, I found one difference: $ diff -U2 oldtukeyplot.ps /Volumes/TIKKU/tukeyplot.ps --- oldtukeyplot.ps2010-12-14 12:06:07.0 +0200 +++ /Volumes/TIKKU/tukeyplot.ps2010-12-14 12:13:32.0 +0200 @@ -172,5 +172,5 @@ 0 setgray 0.00 setlinewidth -[ 3.00 5.00] 0 setdash +[ 0.00 0.00] 0 setdash np 660.06 91.44 m Editing the changed line to its old value [ 3.00 5.00] 0 setdash also fixed the problem both in Linux and in Mac. Evidently something has changed, and probably somewhere else than in plot.TukeyHSD (which hasn't changed since r51093 in trunk and never in R-2-12-branch). I know nothing about PostScript so that I cannot say anything more (and I know viewers can fail with standard conforming PostScript but it is a bit disconcerting that two viewers fail when they worked earlier). I must really be avoiding work today ... I can diagnose this (I think) but don't know the best way to solve it. At this point, line widths on PDF devices were allowed to be1. == r52180 | murrell | 2010-06-02 23:20:33 -0400 (Wed, 02 Jun 2010) | 1 line Changed paths: M /trunk/NEWS M /trunk/src/library/grDevices/src/devPS.c allow lwd less than 1 on PDF device == The behavior of PDF devices (by experiment) is to draw a 0-width line as 1 pixel wide, at whatever resolution is currently being rendered. On the other hand, 0-width lines appear to break PostScript. (with the Linux viewer 'evince' I get warnings about rangecheck -15 when trying to view such a file). plot.TukeyHSD contains the lines abline(h = yvals, lty = 1, lwd = 0, col = lightgray) abline(v = 0, lty = 2, lwd = 0, ...) which are presumably meant to render minimum-width lines. I don't know whether it makes more sense to (1) change plot.TukeyHSD to use positive widths (although that may not help: I tried setting lwd=1e-5 and got the line widths rounded to 0 in the PostScript file); (2) change the postscript driver to *not* allow line widths 1 (i.e., distinguish between PS and PDF and revert to the pre-r52180 behaviour for PS only). On reflection #2 seems to make more sense, but digging through devPS.c it's not immediately obvious to me where/how in SetLineStyle or PostScriptSetLineTexture one can tell whether the current driver is PS or PDF ... That may not do it. I find the same
[Rd] postscript failure manifests in plot.TukeyHSD
Hello R Developers, Dear R-developers, I ran some standard tests with currently (today morning) compiled R release candidate in Linux R 2.12.1 RC (2010-12-13 r53843). Some of these tests used plot.TukeyHSD function. This worked OK on the screen (X11 device), but PostScript file could not be rendered. The following example had the problem with me: postscript(file=tukeyplot.ps) example(plot.TukeyHSD) dev.off() I couldn't view the resulting file with evince in Linux nor in the standard Preview in MacOS. When I compared the generated tukeyplot.ps to the same file generated with an older R in my Mac, I found one difference: $ diff -U2 oldtukeyplot.ps /Volumes/TIKKU/tukeyplot.ps --- oldtukeyplot.ps2010-12-14 12:06:07.0 +0200 +++ /Volumes/TIKKU/tukeyplot.ps2010-12-14 12:13:32.0 +0200 @@ -172,5 +172,5 @@ 0 setgray 0.00 setlinewidth -[ 3.00 5.00] 0 setdash +[ 0.00 0.00] 0 setdash np 660.06 91.44 m Editing the changed line to its old value [ 3.00 5.00] 0 setdash also fixed the problem both in Linux and in Mac. Evidently something has changed, and probably somewhere else than in plot.TukeyHSD (which hasn't changed since r51093 in trunk and never in R-2-12-branch). I know nothing about PostScript so that I cannot say anything more (and I know viewers can fail with standard conforming PostScript but it is a bit disconcerting that two viewers fail when they worked earlier). Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] One possible cause for incorrect symbols in X11() output
On 19/08/10 09:55 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: There have been spasmodic reports of symbols such as pi and infinity in plotmath being reproduced incorrectly on the X11 device on some Linux systems (at least Ubuntu 10 and Fedora 12/13), and we've managed to track down one cause whilst investigating PR#14355. Some systems have Wine and hence the Wine symbol font installed. 'fontconfig', which is used by cairographics in X11(type='cairo') and many other applications, prefers the Wine symbol font to the standard Type 1 URW font, and seems to misinterpret its encoding. You may well have Wine installed without realizing it (as I did) -- it is increasingly common as a dependency of other software. The best test is to run % fc-match symbol s05l.pfb: Standard Symbols L Regular This is the result on a system without Wine: if you see % fc-match symbol symbol.ttf: Symbol Regular This seems to be the case with MacOS (10.6.4): $ uname -a Darwin lettu-2.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 $ fc-match symbol Symbol.ttf: Symbol 標準體 The X11(type = 'cairo') shows the problem with example(points); TestChars(font=5). However, there is no problem with the default device (quartz), nor with the default X11() which has type = 'Xlib' (unlike documented in ?X11: 'cairo' is available but 'Xlib' still used). What ever this is worth of (if this is worthless, I'll surely hear about it). Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] One possible cause for incorrect symbols in X11() output
On 19/08/10 14:04 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: OSX. I can't get fc-match to list it, anyway. R's X11(type='cairo') device is using a version of cairographics compiled by Simon which includes a static build of fontconfig. So it is not really 'OSX'! I'm guessing you are using /usr/local/bin/fc-match which AFAIK also Simon's. $ which fc-match /usr/X11/bin/fc-match There seems to be no fc-match in /usr/local/bin/ in my Mac, so no Simon's utilities. (But this is, of course, pretty irrelevant for the main subject, and it seems that my installation of Ubuntu 10.04 is not affected by the problem but has quite regular fonts -- no Wine today. Better that I shut up). Cheers, Jari Oksanen It is also not using pango, and so not selecting fonts the same way as on Linux. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] warning from install.packages()
On 25/05/10 23:25 PM, Ben Bolker bol...@ufl.edu wrote: Just curious: is there a particular reason why install.packages() gives a warning in normal use when 'lib' is not specified (e.g. argument 'lib' is missing: using '/usr/local/lib/R/site-library' )? It would seem to me that this is normal behavior as documented ( If missing, defaults to the first element of .libPaths()¹.) Indeed, should this be a message()? cheers, jaz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-Forge Problems
On 5/05/10 20:53 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: If I go to: http://r-forge.r-project.org/scm/?group_id=18 and click on [Browse Subversion Repository] in the box to the right it takes me to a page that says this: Traceback (most recent call last): File /usr/lib/gforge/bin//viewcvs.cgi, line 27, in import sapi ImportError: No module named sapi whereas I was expecting to get to the svn repository. Gabor, This was already queried in R-Forge site-help (May 2) and reported as bug in R-Forge (May 3, bug #925). There has been no response to either of these report. A News message of April 29 in R-Forge front page predicts that browser functionality will follow soon. So there is hope... Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Canberra distance
On 06/02/2010 18:10, Duncan Murdoch murd...@stats.uwo.ca wrote: On 06/02/2010 10:39 AM, Christophe Genolini wrote: Hi the list, According to what I know, the Canberra distance between X et Y is : sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function 'absolute value') In the source code of the canberra distance in the file distance.c, we find : sum = fabs(x[i1] + x[i2]); diff = fabs(x[i1] - x[i2]); dev = diff/sum; which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ] (note that this does not define a distance... This is correct when x_i and y_i are positive, but not when a value is negative.) Is it on purpose or is it a bug? It matches the documentation in ?dist, so it's not just a coding error. It will give the same value as your definition if the two items have the same sign (not only both positive), but different values if the signs differ. The first three links I found searching Google Scholar for Canberra distance all define it only for non-negative data. One of them gave exactly the R formula (even though the absolute value in the denominator is redundant), the others just put x_i + y_i in the denominator. G'day cobbers, Without checking the original sources (that I can't do before Monday), I'd say that the Canberra distance was originally suggested only for non-negative data (abundances of organisms which are non-negative if observed directly). The fabs(x-y) notation was used just as a convenient tool to get rid off the original pmin(x,y) for non-negative data -- which is nice in R, but not so natural in C. Extension of the Canberra distance to negative data probably makes a new distance perhaps deserving a new name (Eureka distance?). If you ever go to Canberra and drive around you'll see that it's all going through a roundabout after a roundabout, and going straight somewhere means goin' 'round 'n' 'round. That may make you skeptical about the Canberra distance. Cheers, Jazza Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] optional package dependency
On Fri, 2010-01-15 at 00:12 -0600, Jeff Ryan wrote: Hi Ross, The quantmod package makes available routines from a variety of contributed packages, but gets around your issues with a bit of, um, trickery. Take a look here (unless your name is Kurt ;-) ): http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/R/buildModel.methods.R?rev=367root=quantmodview=markup It would be nice to have Suggests really mean suggests to check, but I am sure there is a good reason it doesn't. I agree: it would be nice to have Suggests really mean suggests, and I 'suggested' so in an R-devel message of 20/9/05 with subject Shy Suggestion (but this seems not exist in the R-devel archive?). I got some support, but not from the right people, and so the R suggestion remains the one you can't refuse or you'll wake up with a horse head in your bed. I can live with this forced suggestion, although it is sometimes painful, in particular in Mac or after re-installing everything from scratch in Linux. The main argument was that building may fail later if you don't check suggests early so that you must (de facto) depend on packages you suggest. I'm sure many packages would fail now if the interpretation of suggests was changed because the behaviour of suggests and depends has been near identical for a long time and people have adapted. The window of opportunity for another interpretation was when the checks for undefined request() was added to the R CMD check routines in 2005, but then it was decided that suggests should be near equivalent to depends, and this will stick. Cheers, Jari Oksanen -- Jari Oksanen, Department of Biology, Univ Oulu, FI-90014 Oulu, Finland http://www.oulu.fi/~jarioksa http://vegan.r-forge.r-project.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] file.rename overwrites existing target (PR#14065)
On 15/11/09 16:35 PM, jo...@web.de jo...@web.de wrote: Full_Name: Jens Oehlschlägel Version: 2.10.0 OS: Windows XP Professional Submission from: (NULL) (85.181.158.112) file.rename() will successfully rename file a to b - even if b exists already. Though the documentation does not state what file.rename() will do in this case, I guess the expected behaviour is to fail and return FALSE. The *expected* behaviour is to overwrite the old file. Your expectation seems to be different, but overwriting or deleting the old file has been the behaviour for ever (= since 1970s). This is how MacOS defines the behaviour of the system command 'rename': RENAME(2) BSD System Calls Manual NAME rename -- change the name of a file ... DESCRIPTION The rename() system call causes the link named old to be renamed as new. If new exists, it is first removed. The behaviour is the same in all posixy systems. Sebsinble systems like R follow the documented standard behaviour. Why would you expect that 'file.rename' fails if the 'new' file exists? The unix command 'mv' (move) that does the 'rename' has a switch to overturn the standard 'rename' system call, and prompt for the removal of the 'new' file. However, this switch is usually not the default in unixy systems, unless defined so in the shell start up script of the user. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] eurodist example dataset is malformed
Justin, I suggest you try to remove your malformed eurodist and use the one in R. The svn logs show no changes in eurodist since 2005 when 'r' was added to 'Gibralta' (it still has all the wrong distances which perhaps go back to the poor quality of Cambridge Encyclopaedia). I also installed R 2.9.1 for MacOS to see that there neither is a change in 'eurodist' in the Mac distribution. My virgin eurodist in Mac was clean, with all its errors. All this hints that you have a local copy of malformed eurodist in your computer. Perhaps rm(eurodist) eurodist will help. Cheers, Jari Oksanen On 15/08/09 06:13 AM, Justin Donaldson jjdon...@indiana.edu wrote: Here's my osx data/session info (identical after a re-install): class(eurodist) [1] data.frame sessionInfo() R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base -Justin On Thu, Aug 13, 2009 at 4:48 AM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote: On Wed, 2009-08-12 at 20:26 -0400, Justin Donaldson wrote: The eurodist dataset (my favorite for mds) is malformed. Instead of a standard distance matrix, it's a data frame. The rownames have gotten 'bumped' to a new anonymous dimension X. It's possible to fix the data, but it messes up a lot of example code out there. X Athens Barcelona Brussels Calais ... 1Athens 0 3313 2963 3175 2 Barcelona 3313 0 1318 1326 3 Brussels 2963 13180204 4Calais 3175 1326 204 0 5 Cherbourg 3339 1294 583460 6 Cologne 2762 1498 206409 ... Best, -Justin What version of R, platform, loaded packages etc? This is not what I see on Linux, 2.9.1-patched r49104. class(eurodist) [1] dist sessionInfo() R version 2.9.1 Patched (2009-08-07 r49104) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8; LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.1 Have you tried this in a clean session to see if it persists there? If you can reproduce this in a clean session with an up-to-date R or R-Devel then send details of your R back to the list for further investigation. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/http://www.ucl.ac.uk/%7Eucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] cmdscale non-Euclidean dissimilarities
Dear R gurus, I think that cmdscale() function has problems with non-Euclidean distances which have negative eigenvalues. The problems are two-fold: (1) Wrong eigenvalue is removed: there will be at least one zero eigenvalue in cmdscale, and the function assumes it is the last one. With non-Euclidean dissimilarities you will have negative eigenvalues, and the zero eigenvalue will be the last positive one before negative eigenvalues. Now the function returns the zero eigenvalue and corresponding zero-eigenvector, but drops the last negative eigenvalue (which has larger absolute value than any other negative eigenvalue). (2) Gower (1985) says that with non-Euclidean matrices and negative eigenvalues you will have imaginary axes, and the distances on imaginary axes (negative eigenvalues) should be subtracted from the distances on real axes (positive eigenvalues). The formulation in the article is like this (Gower 1985, p. 93): f_{ii} + f_{jj} - 2 f_{ij} = d_{ij}^2 = \sum_{p=1}^r (l_{pi} - l_{pj})^2 - \sum_{p=r+1}^{r+s} (l_{pi} - l_{pj})^ 2 This is the usual Pythagorean representation of squared distances in terms of coordinates $l_{pi} (p = 1, 2 \ldots r+s)$, except that for $pr$ the coordinates become purely imaginary. This also suggests that for GOF (goodness of fit) measure of cmdscale() the negative eigenvalues should be subtracted from the sum of positive eigenvalues. Currently, the function uses two ways: the sum of abs values of eigenvalues (and it should be sum of eigenvalues with their signs), and the sum of above-zero eigenvalues for the total. The latter makes some sense, but the first looks non-Gowerian. Reference Gower, J. C. (1985) Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra and its Applications 67, 81--97. The following change seems to avoid both problems. The change removes only the eigenvalue that is closest to the zero. There may be more than one zero eigenvalue (or of magnitude 1e-17), but this leaves the rest there. It also changes the way the first alternative of GOF is calculated. This changes the code as little as possible, and it still leaves behind some cruft of the old code that assumed that last eigenvalue is the zero eigenvalue. --- R/src/library/stats/R/cmdscale.R(revision 48741) +++ R/src/library/stats/R/cmdscale.R(working copy) @@ -56,6 +56,9 @@ x[non.diag] - (d[non.diag] + add.c)^2 } e - eigen(-x/2, symmetric = TRUE) +zeroeig - which.min(abs(e$values)) +e$values - e$values[-zeroeig] +e$vectors - e$vectors[ , -zeroeig, drop = FALSE] ev - e$values[1L:k] if(any(ev 0)) warning(gettextf(some of the first %d eigenvalues are 0, k), @@ -63,9 +66,9 @@ points - e$vectors[, 1L:k, drop = FALSE] %*% diag(sqrt(ev), k) dimnames(points) - list(rn, NULL) if (eig || x.ret || add) { -evalus - e$values[-n] +evalus - e$values list(points = points, eig = if(eig) ev, x = if(x.ret) x, ac = if(add) add.c else 0, - GOF = sum(ev)/c(sum(abs(evalus)), sum(evalus[evalus 0]))) + GOF = sum(ev)/c(sum(evalus), sum(evalus[evalus 0]))) } else points } Best wishes, Jari Oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, FI-90014 Oulu, Finland email jari.oksa...@oulu.fi, homepage http://cc.oulu.fi/~jarioksa/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Windows binary packages R-Forge
On Wed, 2008-05-07 at 09:48 +0200, Yohan Chalabi wrote: Hi room, There seems to be a problem with the Windows building machines of R-Forge. All our packages with Fortran source code cannot be compiled for Windows. The error in the log file is make[3]: gfortran: Command not found It seems that gfortran is not installed. Is there any plan to fix this or am I doing something wrong on R-Forge? thanks in advance for your advises. Dear Yohan Chalabi, This has been reported on R-Forge support forum on 29 April, 2008: https://r-forge.r-project.org/tracker/index.php?func=detailaid=139group_id=34atid=194 Thomas Petzold even posted there the probable cure. I hope the issue will be solved some day soon. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] xspline(..., draw=FALSE) fails if there is no open device (PR#10727)
Full_Name: Jari Oksanen Version: 2.6.2 RC (2008-02-07 r44369) OS: Linux Submission from: (NULL) (130.231.102.145) Even if function xspline() is called with argument draw=FALSE, it requires a graphics device (that it won't use since it was draw=FALSE). I run into this because I intended to use xspline within a function (that does not yet draw: there is plot method for that), and the function failed when called in a virgin environment. Here is an example in a virgin environemt just after starting R: out - xspline(c(0,1,0), c(1,0,1), draw=FALSE) Error in xspline(c(0, 1, 0), c(1, 0, 1), draw = FALSE) : plot.new has not been called yet str(out) Error in str(out) : object out not found This works: plot(0) out - xspline(c(0,1,0), c(1,0,1), draw=FALSE) str(out) List of 2 $ x: num [1:3] 0 1 0 $ y: num [1:3] 1 0 1 This won't: dev.off() null device 1 xspline(c(0,1,0), c(1,0,1), draw=FALSE) Error in xspline(c(0, 1, 0), c(1, 0, 1), draw = FALSE) : plot.new has not been called yet R graphics internal are black magic to me. However, it seems that the error messge comes from function GCheckState(DevDesc *dd) in graphics.c, which is called by do_xspline(SEXP call, SEXP op, SEXP args, SEXP env) in plot.c even when xspline was called with draw = FALSE (and even before getting the argument draw into do_xspline). It seems that graphics device is needed somewhere even with draw = FALSE, since moving the GCheckState() test after findig the value draw, and executing the test only if draw=TRUE gave NaN as the numeric output. If this is documented behaviour, the documentation escaped my attention and beg for pardon. It may be useful to add a comment on the help page saying that an open graphics device is needed even when unused with draw=FALSE. Cheers, Jari Oksanen platform = i686-pc-linux-gnu arch = i686 os = linux-gnu system = i686, linux-gnu status = RC major = 2 minor = 6.2 year = 2008 month = 02 day = 07 svn rev = 44369 language = R version.string = R version 2.6.2 RC (2008-02-07 r44369) Locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Saving Graphics File as .ps or .pdf (PR#10403)
On Wed, 2007-11-07 at 10:51 +0100, Simone Giannerini wrote: [snip] (this is from pd = Peter Dalgaard) Maybe, but given the way things have been working lately, it might be better to emphasize (a) check the mailinglists (b) try R-patched (c) if in doubt, ask, rather than report as bug (Ideally, people would try the prerelease versions and problems like this would be caught before the actual release, but it seems that they prefer treating x.y.0 as a beta release...) I am sorry but I do not agree with point (b) for the very simple fact that the average Windows user do not know how to compile the source code and might not even want to learn how to do it. The point is that since (if I am correct) the great majority of R users go Windows you would miss an important part of potential bug reports by requiring point (b) whereas (a) and (c) would suffice IMHO. Maybe if there were Win binaries of the prerelease version available some time before the release you would get much more feedback but I am just guessing. First I must say that patched Windows binaries are available from CRAN with one extra click -- Linux and poor MacOS users must use 'svn co' to check out the patched version from the repository and compile from the sources. The attribute poor for MacOS users was there because this is a bigger step for Mac users than Linux users (who can easily get and install all tools they need and tend to have a different kind of mentality). Then I must say that I do not like this policy either. I think that is fair to file a bug report against the latest release version in good faith without being chastised and condemned. I know (like pd says above) that some people really do treat x.y.0 as beta releases: a friend of mine over here even refuses to install R x.x.0 versions just for this reason (in fact, he's pd's mate, too, but perhaps pd can talk him over to try x.x.0 versions). Filing a bug report against latest x.x.1 shouldn't be too bad either. I guess the problem here is that R bug reports are linked to the Rd mailing list, and reports on alredy fixed bugs really are irritating. In more loosely connected bug reporting systems you simply could mark a bug as a duplicate of # and mark it as resolved without generating awfully lot of mail. Then it would be humanly possible to adopt a more neutral way of answering to people who reported bugs in latest releases. Probably that won't happen in the current environment. Cheers, Jari Oksanen PS. Please Mr Moderator, don't treat me so mean (*): I've subscribed to this group although you regularly reject my mail as coming from a non-member. (*) an extract from a classic song Mr R jumped the rabbit. -- Jari Oksanen [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R can't source() long lines (PR#10383)
On Tue, 2007-10-30 at 08:10 +0100, [EMAIL PROTECTED] wrote: This is as documented in ?source, and so is not a bug. This gives us a FAQ answer: Q: What is the difference between a feature and a bug? A: Features are documented, bugs are undocumented. If it is a bug, it is either a bug in a function or a bug in the documentation (usually the latter). cheers, jari oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] boxplot() confuses x- and y-axes (PR#10345)
On Mon, 2007-10-15 at 15:25 +0200, [EMAIL PROTECTED] wrote: ms == marc schwartz [EMAIL PROTECTED] on Mon, 15 Oct 2007 14:20:16 +0200 (CEST) writes: ms On Mon, 2007-10-15 at 10:30 +0200, [EMAIL PROTECTED] wrote: Full_Name: Bob O'Hara Version: 2.6.0 OS: Windows XP Submission from: (NULL) (88.112.20.250) Using horizontal=TRUE with boxplot() confuses it as to what is an x- or y-axis. At least, xlim= and ylim= are the wrong way round, log=x (or y) and xaxt= work as expected, I haven't looked at anything else. Some code to see if you can reproduce the bug (or discover it's in my head...): boxplot(count ~ spray, data = InsectSprays) # Try to change x-axis: boxplot(count ~ spray, data = InsectSprays, xlim=c(0,50)) # Plot horizontally: boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE) # Now try to change x-axis: boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE, xlim=c(0,50)) # Changes y-axis! # Now try to change y-axis: boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE, ylim=c(0,50)) # Changes x-axis! # Plot x-axis on log scale: boxplot(count+1 ~ spray, data = InsectSprays, horizontal=TRUE, log=x) # Does indeed change x-axis # Don't add ticks on x-axis: boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE, xaxt=n) # Works as expected. ms Hi Bob, ms No, it's not in your head. This is documented in ?bxp, which is the ms function that actually does the plotting for boxplot(). See the ms description of 'pars' in ?bxp: ms Currently, yaxs and ylim are used ‘along the boxplot’, i.e., ms vertically, when horizontal is false, and xlim horizontally. ms So essentially, the named 'x' and 'y' axes are rotated 90 degrees when ms you use 'horizontal = TRUE', rather than the vertical axis always being ms 'y' and the horizontal axis always being 'x'. This has been discussed on ms the lists previously. Yes; thank you, Marc. And the reason for this is very sensible I think: If you have a longish boxplot() or bxp() command, and you just want to go from vertical to horizontal or vice versa, it makes most sense just to have to change the 'horizontal' flag and not having to see if there are other 'x*' and or 'y*' arguments that all need to be changed as well. Except that you must change xaxt/yaxt and log=x/log=y which do not follow the along the box logic, and behave differently than xlim/ylim. Nothing of this is fatal, but this probably needs more than one iteration to find which way each of the x* and y* arguments works. cheers, jari oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] paste() with NAs .. change worth persuing?
On 22 Aug 2007, at 20:16, Duncan Murdoch wrote: On 8/22/2007 11:50 AM, Martin Maechler wrote: Consider this example code c1 - letters[1:7]; c2 - LETTERS[1:7] c1[2] - c2[3:4] - NA rbind(c1,c2) ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] ## c1 a NA c d e f g ## c2 A B NA NA E F G paste(c1,c2) ## - [1] a A NA B c NA d NA e E f F g G where a more logical result would have entries 2:4 equal to NA i.e., as.character(NA) akaNA_character_ Is this worth persuing, or does anyone see why not? A fairly common use of paste is to put together reports for human consumption. Currently we have p - as.character(NA) paste(the value of p is, p) [1] the value of p is NA which looks reasonable. Would this become p - as.character(NA) paste(the value of p is, p) [1] NA under your proposal? (In a quick search I was unable to find a real example where this would happen, but it would worry me...) At least stop() seems to include such a case: message - paste(args, collapse = ) and we may expect there are NAs sometimes in stop(). cheers, jazza -- Jari Oksanen, Oulu, Finland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] package check note: no visible global function definition (in functions using Tcl/Tk)
On Tue, 2007-06-12 at 00:42 +0200, Henrik Bengtsson wrote: On 6/11/07, Seth Falcon [EMAIL PROTECTED] wrote: Prof Brian Ripley [EMAIL PROTECTED] writes: It seems that is happens if package tcltk is missing from the Depends: list in the DESCRIPTION file. I just tested with Amelia and homals and that solved the various warnings in both cases. Adding tcltk to Depends may not always be the desried solution. If tcltk is already in Suggests, for example, and the intention is to optionally provide GUI features, then the code may be correct as-is. That is, codetools will issue the NOTEs if you have a function that looks like: f - function() { if (require(tckltk)) { someTckltkFunctionHere() } else otherwiseFunction() } } There are a number of packages in the BioC repository that provide such optional features (not just for tcltk) and it would be nice to have a way of declaring the use such that the NOTE is silenced. Same scenario here: I am using Suggest and I found that the NOTEs go away if you call the function with double-colon (::), e.g. tcltk::someTckltkFunctionHere(). I also got several NOTEs about non-declared objects if I used request(pkgname), but they go away with request(pkgname). The real problem here is what are the consequences for CRAN auditing with the new defaults. Do you have to pass these tests also? Do you implement stricter package dependence checking? Do you still allow the check circumvention device that Henrik, perhaps unwisely, revealed here (that is package::function)? Just being curious, I run checkUsagePackage() for my CRAN package (vegan), and got 109 messages. 58 of these were local variables assigned but may not be used and need be checked. My first impression was that they were just harmless leftover, and removing those is not among my top priorities, but may wait till September. Some were false positives. Most of the rest (49 + 1 special case) were calls to functions in other packages with require || stop in the function body. I'd like to keep them like this, or at least with the circumvention device. Please don't make this test a requirement in CRAN submissions! One real error was detected also, but fixing that error broke the function, since the rest of the function already was expecting erroneous output to work correctly. I urge for more relaxed dependence checking allowing calls to other packages in functions. I've been a Linux user since Red Hat 5.1 and I know what is a dependence hell (package depending on package depending ... depending on broken package). There already are some signs of that in R, in particular in unsupported platforms like MacOS 10.3.9 where I have trouble in installing some packages that depend on packages... (if somebody wonders why I still use MacOS 10.3.9, I can give 129 reasons, each worth one Euro). cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'R CMD INSTALL mypkg' doesn't always update help pages
On 6 Jun 2007, at 01:45, Herve Pages wrote: Hi, 'R CMD INSTALL mypkg' and 'install.packages(mypkg, repos=NULL)' don't update mypkg help pages when mypkg is a source directory. They only install new help pages if there are some but they leave the already installed pages untouched. So you end up with mixed man pages from different versions of the package :-/ I have observed this, too. cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] step() in sink() and Sweave()
Dear developers, I just noticed that step() function currently prints the current model using message(), but the resulting model using print(). The relevant commands within the step() body are: if (trace) message(Start: AIC=, format(round(bAIC, 2)), \n, cut.string(deparse(as.vector(formula(fit, \n) (with example() output:) Start: AIC=190.69 Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality And later: if (trace) print(aod[o, ]) (with example() output:) Df Sum of SqRSSAIC - Examination 1 53.0 2158.1 189.9 none 2105.0 190.7 - Agriculture 1 307.7 2412.8 195.1 - Infant.Mortality 1 408.8 2513.8 197.0 - Catholic 1 447.7 2552.8 197.8 - Education 11162.6 3267.6 209.4 This is a nuisance if you want to divert output to a file with sink() or use step() in Sweave: the header and the table go to different places, and without message() part the print() part is crippled. It may be that there is some way to avoid this, but obviously that needs some degree of acrobatic R skills. An example of the behaviour: sink(tempfile()) example(step) sink() I assueme that the behaviour is intentional but searching NEWS did not give any information or reasoning. Would it be sensible to go back to the old behaviour? I found some Swoven files from R 2.4.0 that still put both parts of the output to the same place. For the sake of Sweave and sink, I'd prefer the one place to be stdout instead of stderr. Best wishes, Jari Oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] prcomp: problem with zeros? (PR#8870)
On 17 May 2006, at 22:02, [EMAIL PROTECTED] wrote: On Wed, 17 May 2006, [EMAIL PROTECTED] wrote: prcomp has a bug which causes following error Error in svd(x, nu = 0) : infinite or missing values in 'x' on a valid data set (no Infs, no missing values). The error is most likely caused by the zeros in data. Why do you say that? Without a reproducible example, we cannot judge what is going on. If you called prcomp with scale=TRUE on a matrix that has a completely zero (or constant) column, then this is a reasonable error message. Constant columns (which is a likely reason here) indeed become NaN after scale(), but the error message was: Error in svd(x, nu = 0) : infinite or missing values in 'x' and calling this 'reasonable' is stretching the limits of reason. However, in general this is easy to solve: scale() before the analysis and replace NaN with 0 (prcomp handles zeros). For instance, x - scale(x) x[is.nan(x)] - 0 prcomp(x) (and a friendly prcomp() would do this internally.) cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] typo in `eurodist'
Dear all, There really seem to be many exciting issues in spelling and in detecting spelling errors. However, a more disturbing feature in 'eurodist' to me is that the distances seem to be wrong. There are several cases where the triangle inequality is violated so that a trip from A to B is shorter when you make a detour via X instead of going directly (see require(fortunes); fortune(eurodist) for an example). A quick look revealed that you can find such a shorter detour for 104 of 210 distances of 'eurodist'. There is no guarantee that these shortest path distances would be correct, but at least they are metric. Just for fun, here are the differences between actual eurodist's and shortest paths among the towns in the eurodist data: Athens Barcelona Brussels Calais Cherbourg Barcelona 1036 Brussels 635 0 Calais 705130 Cherbourg 819 00 0 Cologne448 1390 0 0 Copenhagen 507 459 525537 545 Geneva 879 00 0 0 Gibralta 1037 00 0 2 Hamburg438 2140 0 0 Hook of Holland530 00 0 0 Lisbon1623 1 216135 0 Lyons 1022 00 0 0 Madrid1036 00 0 0 Marseilles1037 01 0 0 Milan 879410 1092 Munich 445610 26 0 Paris 798 00 0 0 Rome 0 00 991 Stockholm 508 459 525537 546 Vienna 070 32 35 0 Cologne Copenhagen Geneva Gibralta Hamburg Barcelona Brussels Calais Cherbourg Cologne Copenhagen 222 Geneva 790300 Gibralta 0499 0 Hamburg 0 0 0 49 Hook of Holland 0 0 460 0 Lisbon 3986626000 334 Lyons 0327 00 0 Madrid 26499 00 48 Marseilles1327 00 0 Milan 0171 0 40 102 Munich0 0 0 89 0 Paris 0450 00 0 Rome 0 98 810 29 Stockholm 215 0300 539 0 Vienna0 0 0 70 0 Hook of Holland Lisbon Lyons Madrid Marseilles Barcelona Brussels Calais Cherbourg Cologne Copenhagen Geneva Gibralta Hamburg Hook of Holland Lisbon 240 Lyons 1 0 Madrid0 0 0 Marseilles1264 0 0 Milan 1744 0115 0 Munich067065 70160 Paris 0150 0 0 1 Rome 0608 134 1 0 Stockholm 581272 327539327 Vienna067270 41 0 Milan Munich Paris Rome Stockholm Barcelona Brussels Calais Cherbourg Cologne Copenhagen Geneva Gibralta Hamburg Hook of Holland Lisbon Lyons Madrid Marseilles Milan Munich 0 Paris 57 0 Rome0 2991 Stockholm 171 0 451 105 Vienna139 0 00 1 It seems that marginal towns (Athens, Lisbon, Stockholm, Copenhagen) have largest discrepancies. It also seems that the names are not 'localized', but weird English forms are used for places like København and Wien so dear to the R core developers. cheers, jari oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Shy Suggestion?
The R-exts manual says about 'Suggests' field in package DESCRIPTION: The optional `Suggests' field uses the same syntax as `Depends' and lists packages that are not necessarily needed. However, this seems to be a suggestion you cannot refuse. If you suggest packages: (a line from DESCRIPTION): Suggests: MASS, ellipse, rgl, mgcv, akima, lattice This is what happens: $ /tmp/R-alpha/bin/R CMD check vegan * checking for working latex ... OK * using log directory '/home/jarioksa/devel/R/vegan.Rcheck' * using R version 2.2.0, 2005-09-19 * checking for file 'vegan/DESCRIPTION' ... OK * this is package 'vegan' version '1.7-75' ... clip ... * checking package dependencies ... ERROR Packages required but not available: ellipse rgl akima In my cultural context suggesting a package means that it is not necessarily needed and the check should not fail, although some functionality would be unavailable without those packages. I want the package to pass the tests in a clean standard environment without forcing anybody to load any extra packages. Is there a possibility to be modest and shy in suggestions so that it would be up to the user to get those extra packages needed without requiring them in R CMD check? I stumbled on this with earlier versions of R, and then my solution was to suggest nothing. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Shy Suggestion?
On Tue, 2005-09-20 at 09:42 -0400, Roger D. Peng wrote: I think this needs to fail because packages listed in 'Suggests:' may, for example, be needed in the examples. How can 'R CMD check' run the examples and verify that they are executable if those packages are not available? I suppose you could put the examples in a \dontrun{}. Yes, that's what I do, and exactly for that reason: if something is not necessarily needed (= 'suggestion' in this culture), it should not be required in tests. However, if I don't use \dontrun{} for a non-recommended package, the check would fail and I would get the needed information: so why should the check fail already when checking DESCRIPTION? cheers, jari oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] generic function argument list problem
On Wed, 2005-08-31 at 08:09 +0100, Robin Hankin wrote: Hi it says in R-exts that A method must have all the arguments of the generic, including ... if the generic does. A method must have arguments in exactly the same order as the generic. A method should use the same defaults as the generic. So, how come the arguments for rep() are (x, times, ...) and the arguments for rep.default() are (x, times, length.out, each, ...) ? Shouldn't these be the same? I am writing a rep() method for objects with class octonion, and my function rep.octonion() has argument list (x, times, length.out, each, ...) just like rep.default(), but R CMD check complains about it, pointing out that rep() and rep.octonion() have different arguments. What do I have to do to my rep.octonion() function to make my package pass R CMD check without warning? I cannot repeat your problem. Probably you did something differently than you said (like omitted ... , misspelled times as time or something else in your rep.octonion). This is what I tried. In R: str(rep) function (x, times, ...) rep.octonion - function(x, times, length.out, each, ...) {} package.skeleton(octonion, rep.octonion) Creating directories ... Creating DESCRIPTION ... Creating READMEs ... Saving functions and data ... Making help files ... Created file named './octonion/man/rep.octonion.Rd'. Edit the file and move it to the appropriate directory. Done. Further steps are described in ./octonion/README Then I edited octonion/man/rep.octonion.Rd so that it uses the generic and passes R CMD check (virgin Rd files produced by package.skeleton fail the test, which I found a bit weird). Here are the minimum changes you need to pass the tests. --- rep.octonion.Rd.orig2005-08-31 10:56:36.0 +0300 +++ rep.octonion.Rd 2005-08-31 10:55:25.0 +0300 @@ -7,5 +7,5 @@ } \usage{ -rep.octonion(x, times, length.out, each, ...) +\method{rep}{octonion}(x, times, length.out, each, ...) } %- maybe also 'usage' for other objects documented here. @@ -18,5 +18,5 @@ } \details{ - ~~ If necessary, more details than the __description__ above ~~ + ~~ If necessary, more details than the description above ~~ } \value{ @@ -31,7 +31,7 @@ \note{ ~~further notes~~ } - ~Make other sections like Warning with \section{Warning }{} ~ -\seealso{ ~~objects to See Also as \code{\link{~~fun~~}}, ~~~ } + +\seealso{ ~~objects to See Also as \code{\link{rep}}, ~~~ } \examples{ ## Should be DIRECTLY executable !! @@ -42,4 +42,4 @@ function(x, time, length.out, each, ...) {} } -\keyword{ ~kwd1 }% at least one, from doc/KEYWORDS -\keyword{ ~kwd2 }% __ONLY ONE__ keyword per line +\keyword{ models }% at least one, from doc/KEYWORDS + So this replaces rep.octonion with \method{rep}{octonion}, removes __ from description (these cause latex errors), remove a hanging top level text Make other sections..., and removes a link to non-existent ~~fun~~ (I'm not sure if adding a real keyword is necessary). This passes tests. Including * checking S3 generic/method consistency ... OK Conclusion: check your files. (It is pain: been there, done that.) cheers, jari oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Why should package.skeleton() fail R CMD check?
I find it a bit peculiar that a package skeleton created with a utils function package.skeleton() fails subsequent R CMD check. I do understand that the function is intended to produce only a skeleton that should be edited by the package author. I think that it would be justified to say that the skeleton *should* fail the test. However, I have two arguments against intentional failure: * When you produce a skeleton, a natural thing is to see if it works and run R CMD check. It is is baffling (but educating) if this fails. * The second argument is more major: If you produce a package with several functions, you want to edit one Rd file in time to see what errors you made. You don't want to correct errors in other Rd files not yet edited by you to see your own errors. This kind of incremental editing is much more pleasant, as following strict R code is painful even with your own mistakes. The failure comes only from Rd files, and it seems that the violating code is produced by prompt.default function hidden in the utils namespace. I attach a uniform diff file which shows the minimal set of changes I had to do make utils:::prompt.default to produce Rd files passing R CMD check. There are still two warnings: one on missing source files and another on missing keywords, but these are not fatal. This still produces bad looking latex. These are the changes I made * I replaced __description__ with description, since __ will give latex errors. * I enclosed Make other sections within Note, so that it won't give error on stray top level text. It will now appear as numbered latex \section{} in dvi file, but that can the package author correct. * I replaced reference to a non-existent function ~~fun~~ with a reference to function help. I'm sorry for the formatting of the diff file: my emacs/ESS is cleverer than I and changes indentation and line breaks against my will. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why should package.skeleton() fail R CMD check?
On Wed, 2005-08-31 at 11:23 +0200, Martin Maechler wrote: Since you didn't use text/plain as content type, your attachment didn't make it to the list anyway, Yeah, I noticed. and you have a second chance: Please use a diff -u against https://svn.R-project.org/R/trunk/src/library/utils/R/prompt.R or maybe even a diff -ubBw ... one. Here comes a uniform diff against svn source of prompt.R. I hope I made all the same changes as previously. At least package.skeletons with this pass R CMD check with the same two warnings as previously (after you beat the namespace -- oh how I hate namespaces): --- prompt.R2005-08-31 12:30:28.0 +0300 +++ prompt.R.new2005-08-31 12:32:13.0 +0300 @@ -96,5 +96,5 @@ details = c(\\details{, paste( ~~ If necessary, more details than the, - __description__ above ~~), + description above ~~), }), value = c(\\value{, @@ -108,11 +108,11 @@ literature/web site here ~ }), author = \\author{ ~~who you are~~ }, - note = c(\\note{ ~~further notes~~ }, + note = c(\\note{ ~~further notes~~ , , paste( ~Make other sections like Warning with, \\section{Warning }{} ~), - ), + }), seealso = paste(\\seealso{ ~~objects to See Also as, - \\code{\\link{~~fun~~}}, ~~~ }), + \\code{\\link{help}}, ~~~ }), examples = c(\\examples{, ## Should be DIRECTLY executable !! , Cheers, Jari Oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] problem using model.frame()
On 18 Aug 2005, at 1:49, Gavin Simpson wrote: On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: GS == Gavin Simpson [EMAIL PROTECTED] on Tue, 16 Aug 2005 18:44:23 +0100 writes: GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck GS wrote: On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: It can handle data frames like this: model.frame(y1) or model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) GS Thanks again Gabor for your comments, GS I'd prefer the y1 ~ y2 as data frames - as this is the GS most natural way of doing things. I'd like to have (y2 GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also GS work - silently without any trouble. I'm sorry, Gavin, I tend to disagree quite a bit. The formula notation has quite a history in the S language, and AFAIK never was the idea to use data.frames as formula components, but rather as environments in which formula components are looked up --- exactly as Gabor has explained. Hi Martin, thanks for your comments, But then one could have a matrix of variables on the rhs of the formula and it would work - whether this is a documented feature or un-intended side-effect of matrices being stored as vectors with dims, I don't know. And whilst the formula may have a long history, a number of packages have extended the interface to implement a specific feature, which don't work with standard functions like lm, glm and friends. I don't see how what I wanted to achieve is greatly different to that or using a matrix. To break with such a deeply rooted principle, you should have very very good reasons, because you're breaking the concepts on which all other uses of formulae are based. And this would potentially lead to much confusion of your users, at least in the way they should learn to think about what formulae mean. In the end I managed to treat y1 ~ y2 (both data frames) as a special case, which allows the existing formula notation to work as well, so I can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This is what I wanted all along, to extend my interface (not do anything to R's formulae), but to also work in the traditional sense. The model I am writing code for really is modelling the relationship between two matrices of data. In one version of the method, there is real equivalence between both sides of the formula so it would seem odd to treat the two sides of the formula differently. At least to me ;-) It seems that I may be responsible for one of these extensions (lhs as a data.frame in cca and rda in vegan package). There the response (lhs) is multivariate or a multispecies community, and you must take that as a whole without manipulation (and if you tried using VGAM you see there really is painful to define lhs with, say, 127 elements). However, in general you shouldn't use models where you use all the 'explanatory' variables (rhs) that yo happen to have by accident. So much bad science has been created with that approach even in your field, Gav. The whole idea of formula is the ability to choose from candidate variables. That is: to build a model. Therefore you have one-sided formulae in prcomp() and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or prcomp(~ . - x3, data). I think you should try to keep it so. Do instead like Gabor suggested: you could have a function coca.default or coca.matrix with interface: coca.matrix(matx, maty, matz) -- or you can name this as coca.default. and coca.formula which essentially parses your formula and returns a list of matrices you need: coca.formula - function(formula, data) { matricesout - parsemyformula(formula, data) coca(matricesout$matx, matricesout$maty, matricesoutz) } Then you need the generic: coca - function(...) UseMethod(coca) and it's done (but fails in R CMD check unless you add ... in all specific functions...). The real work is always done in coca.matrix (or coca.default), and the others just chew your data into suitable form for your workhorse. If then somebody thinks that they need all possible variables as 'explanatory' variables (or perhaps constraints in your case), they just call the function as coca(matx, maty, matz) And if you have coca.data.frame they don't need 'quacking' with extra steps: coca.data.frame - function(dfx, dfy dfz) coca(as.matrix(dfx), as.matrix(dfy), as.matrix(dfz)). This you call as coca(dfx, dfy, dfz) and there you go. The essential feature in formula is the ability to define the model. Don't give it away. cheers, jazza -- Jari Oksanen