Re: [R-pkg-devel] R CMD check works but with devtools::check() examples fail

2019-05-16 Thread Jari Oksanen
I think this is because the check systems set different environmental 
variables. I had the same problem in February, and found out that R 3.6.0 (then 
still to come) adds new environmental variable _R_CHECK_LENGTH_1_LOGIC2_. This 
*is* documented, but the documentation is well hidden in R-internals manual. It 
says (or said when I looked at this in early February):

_R_CHECK_LENGTH_1_LOGIC2_
Optionally check if either argument of the binary operators && and || has 
length greater than one. The format is the same as for 
_R_CHECK_LENGTH_1_CONDITION_. Default: unset (nothing is reported)

R has for ages wanted that condition if(A && B) should have length 1, but with 
this variable set, it also wants both A and B have length one (which is not the 
same thing). You need to find the place where this does not happen and fix 
that. If you look at the end of the error message, it even says in which case 
you have a length>1 component in your condition (it is given as length 3 in the 
diagnostic output).

I found this then because win-builder set this environmental variable, and 
there may be other build systems that do the same. You should fix the cases to 
avoid trouble.

Cheers, Jari Oksanen

On 16 May 2019, at 13:08, Gábor Csárdi 
mailto:csardi.ga...@gmail.com>> wrote:

On Thu, May 16, 2019 at 10:56 AM Jack O. Wasey 
mailto:j...@jackwasey.com>> wrote:
Agree with Dirk, and also you are running R CMD check on the current
directory,

Why do you think so? Don't the lines below the "-- Building" header
mean that devtools/rcmdcheck is building the package?

G.

[...]
── Building ─
rdtLite ──
Setting env vars:
● CFLAGS: -Wall -pedantic -fdiagnostics-color=always
● CXXFLAGS  : -Wall -pedantic -fdiagnostics-color=always
● CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always

checking for file
‘/Users/blerner/git/rdtLite.check/rdtLite.Rcheck/00_pkg_src✔  checking
for file
‘/Users/blerner/git/rdtLite.check/rdtLite.Rcheck/00_pkg_src/rdtLite/DESCRIPTION’
─  preparing ‘rdtLite’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘rdtLite_1.0.3.tar.gz’

── Checking ─
[...]
__
R-package-devel@r-project.org<mailto:R-package-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] future time stamps warning

2018-09-20 Thread Jari Oksanen
Could this be a timezone issue (setting the timezone in local computer 
and communicating this to CRAN): when I look at the email in my computer 
I see:



On Thu, Sep 20, 2018 at 11:46 AM Leo Lahti  wrote:

-rwxr-xr-x lei/lei1447 2018-09-20 13:23 eurostat/DESCRIPTION


Which seems to claim that eurostats/DESCRIPTION was nearly three hours 
younger than the email. This clearly was in the future back then.


If so, waiting a couple of hours before submission could help, and there 
should be an optimal solution, too (i.e., CRAN and you communicate the 
timezone or both use the same like UTC).


Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Jari Oksanen
If you have to load two packages which both export the same name in their 
namespaces, namespace does not help in resolving which synonymous function to 
use. Neither does it help to have a package instead of a script as long as you 
end up loading two namespaces with name conflicts. The order of importing 
namespaces can also be difficult to control, because you may end up loading a 
namespace already when you start your R with a saved workspace. Moving a 
function to another package may be a transitional issue which disappears when 
both packages are at their final stages, but if you use the recommend 
deprecation stage, the same names can live together for a long time. So this 
package is a good idea, and preferably base R should be able to handle the 
issue of choosing between exported synonymous functions.

This has bitten me several times in package development, and with growing CRAN 
it is a growing problem. Package authors often have poor control of the issue, 
as they do not know what packages users use. Now we can only have a FAQ that 
tells that a certain error message does not come from a function in our 
package, but from some other package having a synonymous function that was used 
instead.

cheers, Jari Oksanen

On 23 Aug 2018, at 23:46 pm, Duncan Murdoch 
mailto:murdoch.dun...@gmail.com>> wrote:

First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except when 
attaching a new package; have you checked that?

I am not so sure about your heuristics.  Can they be disabled, so the user is 
always forced to make the choice?  Even when a function is intended to adhere 
to the superset principle, they don't always get it right, so a really careful 
user should always do explicit disambiguation.

And of course, if users wrote most of their long scripts as packages instead of 
as long scripts, the ambiguity issue would arise far less often, because 
namespaces in packages are intended to solve the same problem as your package 
does.

One more comment inline about a typo, possibly in an error message.

Duncan Murdoch

On 23/08/2018 2:31 PM, Hadley Wickham wrote:
Hi all,
I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(<https://github.com/r-lib/conflicted>), and looking for feedback.
As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:
-   People don’t read messages about conflicts. Even if you are
conscientious and do read the messages, it’s hard to notice a single
new conflict caused by a package upgrade.
-   The warning and the problem may be quite far apart. If you load all
your packages at the top of the script, it may potentially be 100s
of lines before you encounter a conflict.
-   The error messages caused by conflicts are cryptic because you end
up calling a function with utterly unexpected arguments.
For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:
library(conflicted)
library(dplyr)
library(MASS)
select
#> Error: [conflicted] `select` found in 2 packages.
#> Either pick the one you want with `::`
#> * MASS::select
#> * dplyr::select
#> Or declare a preference with `conflicted_prefer()`
#> * conflict_prefer("select", "MASS")
#> * conflict_prefer("select", "dplyr")

I don't know if this is a typo in your r-devel message or a typo in the error 
message, but you say `conflicted_prefer()` in one place and conflict_prefer() 
in the other.

conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).
conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:
conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: dplyr, MASS
conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.
-   A number of packages provide a function that appears to conflict
with a function in a base package, but the

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-18 Thread Jari Oksanen
I am afraid that these suggestions may not work. There are more choices than 
Win32 and Win64, including several flavours of BLAS/Lapack which probably are 
involved if you evaluate eigenvalues, and also differences in hardware, 
compilers and phase of the moon.  If there are several equal eigenvalues, any 
solution of axes is arbitrary and it can be made stable for testing only by 
chance. If you have M equal eigenvalues, you should try to find a test that the 
M-dimensional (sub)space is approximately correct irrespective of random 
orientation of axes in this subspace.

Cheers, Jari Oksanen

On 18 May 2018, at 00:06 am, Kevin Coombes 
<kevin.r.coom...@gmail.com<mailto:kevin.r.coom...@gmail.com>> wrote:

Yes; but I have been running around all day without time to sit down and
try them. The suggestions make sense, and I'm looking forward to
implementing them.

On Thu, May 17, 2018, 3:55 PM Ben Bolker 
<bbol...@gmail.com<mailto:bbol...@gmail.com>> wrote:

There have been various comments in this thread (by me, and I think
Duncan Murdoch) about how you can identify the platform you're running
on (some combination of .Platform and/or R.Version()) and use it to
write conditional statements so that your tests will only be compared
with reference values that were generated on the same platform ... did
those get through?  Did they make sense?

On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
<kevin.r.coom...@gmail.com<mailto:kevin.r.coom...@gmail.com>> wrote:
Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
the issue. The matrices I am using are all nonsingular, and the various
algorithms have no problem computing the eigenvalues correctly (up to
numerical errors that I can bound and thus account for on tests by
rounding
appropriately). But an eigenvalue of multiplicity M has an M-dimensional
eigenspace with no preferred basis. So, any M-dimensional  (unitary)
change
of basis is permitted. That's what give rise to the lack of
reproducibility
across architectures. The choice of basis appears to use different
heuristics on 32-bit windows than on 64-bit Windows or Linux machines.
As a
result, I can't include the tests I'd like as part of a CRAN submission.

On Thu, May 17, 2018, 2:29 PM William Dunlap 
<wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote:

Your explanation needs to be a bit more general in the case of identical
eigenvalues - each distinct eigenvalue has an associated subspace, whose
dimension is the number repeats of that eigenvalue and the eigenvectors
for
that eigenvalue are an orthonormal basis for that subspace.  (With no
repeated eigenvalues this gives your 'unique up to sign'.)

E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of
0

x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
x
  [,1] [,2] [,3] [,4] [,5]
 [1,]10001
 [2,]01001
 [3,]00101
 [4,]00000
 [5,]11103
the following give valid but different (by more than sign) eigen vectors

e1 <- structure(list(values = c(4, 1, 0.999, 0,
-2.22044607159862e-16
), vectors = structure(c(-0.288675134594813, -0.288675134594813,
-0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
-0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
-0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
-0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
"vectors"), class = "eigen")
e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
   vectors = structure(c(0.288675134594813, 0.288675134594813,
   0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
   0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
   0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
   -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
   0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
), class = "eigen")

I.e.,
all.equal(crossprod(e1$vectors), diag(5), tol=0)
[1] "Mean relative difference: 1.407255e-15"
all.equal(crossprod(e2$vectors), diag(5), tol=0)
[1] "Mean relative difference: 3.856478e-15"
all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
[1] "Mean relative difference: 1.110223e-15"
all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
[1] "Mean relative difference: 9.069735e-16"

e1$vectors
  [,1]   [,2]  [,3] [,4] [,5]
[1,] -0.2886751  0.000  8.164966e-010 -0.5
[2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
[3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
[4,]  0.000  0.000  0.00e+00   -1  0.0
[5,] -0.8660254  0.000 -6.106227e-160  0.5
e2$vectors
 [,1]  [,2]  [,3] [,4] [,5]
[1,] 0.2886751 -7.844376e-01  2.265489e-010 -0.5
[2,] 0.2886751  

Re: [Rd] importing namespaces from base packages

2018-03-13 Thread Jari Oksanen
It seems that they are defined in tools/R/check.R. For instance, line 
363-364 says:


## The default set of packages here are as they are because
## .get_S3_generics_as_seen_from_package needs utils,graphics,stats

and then on lines 368 (Windows) and 377 (other OS) it has:
"R_DEFAULT_PACKAGES=utils,grDevices,graphics,stats"

So these pass R CMD check and are an "industrial standard". Changing 
this will be break half of CRAN packages.


Cheers, Jari Oksanen

On 13/03/18 13:47, Martin Maechler wrote:

Adrian Dușa <dusa.adr...@unibuc.ro>
 on Tue, 13 Mar 2018 09:17:08 +0200 writes:


 > On Mon, Mar 12, 2018 at 2:18 PM, Martin Maechler 
<maech...@stat.math.ethz.ch>
 > wrote:
 >> [...]
 >> Is that so?   Not according to my reading of the 'Writing R
 >> Extensions' manual, nor according to what I have been doing in
 >> all of my packages for ca. 2 years:
 >>
 >> The rule I have in my mind is
 >>
 >> 1) NAMESPACE Import(s|From) \
 >>  <==>  DESCRIPTION -> 'Imports:'
 >> 2) .. using "::" in  R code /
 >>
 >>
 >> If you really found that you did not have to import from say
 >> 'utils', I think this was a *un*lucky coincidence.

 > Of course, the importFrom() is mandatory in NAMESPACE otherwise the 
package
 > does not pass the checks.
 > The question was related to the relation between the packages mentioned 
in
 > the NAMESPACE and the packages mentioned in the Imports: field from
 > DESCRIPTION.

 > For instance, the current version 3.1 of package QCA on CRAN mentions in
 > the DESCRIPTION:

 > Imports: venn (≥ 1.2), shiny, methods, fastdigest

 > while the NAMESPACE file has:

 > import(shiny)
 > import(venn)
 > import(fastdigest)
 > importFrom("utils", "packageDescription", "remove.packages",
 > "capture.output")
 > importFrom("stats", "glm", "predict", "quasibinomial", "binom.test",
 > "cutree", "dist", "hclust", "na.omit", "dbinom", "setNames")
 > importFrom("grDevices", "dev.cur", "dev.new", "dev.list")
 > importFrom("graphics", "abline", "axis", "box", "mtext", "par", "title",
 > "text")
 > importFrom("methods", "is")

 > There are functions from packages utils, stats, grDevices and graphics 
for
 > which the R checks do not require a specific entry in the Imports: field.
 > I suspect because all of these packages are part of the base R, but so is
 > package methods. The question is why is it not mandatory for those 
packages
 > to be mentioned in the Imports: field from DESCRIPTION, while removing
 > package methods from that field runs into an error, despite maintaining 
the
 > package in the NAMESPACE's importFrom().


Thank you, Adrian,  for clarification of your question.
As a matter of fact, I was not aware of what you showed above,
and personally I think I do add every package/namespace mentioned in
NAMESPACE to the DESCRIPTION's  "Imports:" field.

AFAIK the above phenomenon is not documented, and rather the
docs would imply that this phenomenon might go away -- I for one
would vote for more consistency here ..

Martin

 >> [...]
 >> There are places in the R source where it is treated specially,
 >> indeed, part of 'methods' may be needed when it is neither
 >> loaded nor attached (e.g., when R runs with only base, say, and
 >> suddenly encounters an S4 object), and there still are
 >> situations where 'methods' needs to be in the search() path and
 >> not just loaded, but these cases should be unrelated to the
 >> above DESCRIPTION-Imports vs NAMESPACE-Imports correspondence.

 > This is what I had expected myself, then the above behavior has to have
 > another explanation.
 > It is just a curiosity, there is naturally nothing wrong with maintaining
 > package methods in the Imports: field. Only odd why some base R packages
 > are treated differently than other base R packages, at the package checks
 > stage.

 > Thank you,
 > Adrian

 > --
 > Adrian Dusa
 > University of Bucharest
 > Romanian Social Data Archive
 > Soseaua Panduri nr. 90-92
 > 050663 Bucharest sector 5
 > Romania
 > https://adriandusa.eu

 > [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why R should never move to git

2018-01-25 Thread Jari Oksanen

This is exactly the instruction given in  https://xkcd.com/1597/

cheers, J.O.

On 25/01/18 14:48, Mario Emmenlauer wrote:

Hi Duncan!

I think there are many users whose first experiences with git where frustrating,
and trust me, many people here can relate to your pain. I can certainly say that
I can. At first, git makes significant effort to become fluent in seemingly
"simple" tasks. I can literally feel your pain right now.


But this is the main downside of git: that it can be hard to learn. I overcame
this problem by collecting copy-paste-instructions for the most common tasks.
I think Dirk provided a very nice starting point for a typical pull request, and
next time you need to use git, maybe try his instructions. They are *exactly* 
what
I use at least once a week. However they are not 1:1 for your current situation,
where you already started a fork.

If you want to solve your current "mess", I personally find the easiest thing to
move all local changes away (to /tmp/ or wherever), trash the github fork, and
start over with Dirks instructions. At point (4) you can copy your changed files
back from /tmp/ and use them for new commits, in this new, clean branch.

Everything else should just work.

Cheers,

 Mario




On 25.01.2018 13:09, Duncan Murdoch wrote:

On 25/01/2018 6:49 AM, Dirk Eddelbuettel wrote:

On 25 January 2018 at 06:20, Duncan Murdoch wrote:
| On 25/01/2018 2:57 AM, Iñaki Úcar wrote:
| > For what it's worth, this is my workflow:
| >
| > 1. Get a fork.
| > 2. From the master branch, create a new branch called fix-[something].
| > 3. Put together the stuff there, commit, push and open a PR.
| > 4. Checkout master and repeat from 2 to submit another patch.
| >
| > Sometimes, I forget the step of creating the new branch and I put my
| > fix on top of the master branch, which complicates things a bit. But
| > you can always rename your fork's master and pull it again from
| > upstream.
|
| I saw no way to follow your renaming suggestion.  Can you tell me the
| steps it would take?  Remember, there's already a PR from the master
| branch on my fork.  (This is for future reference; I already followed
| Gabor's more complicated instructions and have solved the immediate
| problem.)

1)  Via GUI: fork or clone at github so that you have URL to use in 2)

Github would not allow me to fork, because I already had a fork of the same 
repository.  I suppose I could have set up a new user and done it.

I don't know if cloning the original would have made a difference. I don't have 
permission to commit to the original, and the manipulateWidget maintainers
wouldn't be able to see my private clone, so I don't see how I could create a 
PR that they could use.

Once again, let me repeat:  this should be an easy thing to do.  So far I'm 
pretty convinced that it's actually impossible to do it on the Github website
without hacks like creating a new user.  It's not trivial but not that 
difficult for a git expert using command line git.

If R Core chose to switch the R sources to use git and used Github to host a 
copy, problems like mine would come up fairly regularly.  I don't think R Core
would gain enough from the switch to compensate for the burden of dealing with 
these problems.

Maybe Gitlab or some other front end would be better.

Duncan Murdoch


2)  Run
    git clone giturl
  to fetch local instance
  3)  Run
    git checkout -b feature/new_thing_a
  (this is 2. above by Inaki)
  4)  Edit, save, compile, test, revise, ... leading to 1 or more commits

5)  Run
    git push origin
  standard configuration should have remote branch follow local branch, I
  think the "long form" is
    git push --set-upstream origin feature/new_thing_a

6)  Run
    git checkout -
  or
    git checkout master
  and you are back in master. Now you can restart at my 3) above for
  branches b, c, d and create independent pull requests

I find it really to have a bash prompt that shows the branch:

  edd@rob:~$ cd git/rcpp
  edd@rob:~/git/rcpp(master)$ git checkout -b feature/new_branch_to_show
  Switched to a new branch 'feature/new_branch_to_show'
  edd@rob:~/git/rcpp(feature/new_branch_to_show)$ git checkout -
  Switched to branch 'master'
  Your branch is up-to-date with 'origin/master'.
  edd@rob:~/git/rcpp(master)$ git branch -d feature/new_branch_to_show
  Deleted branch feature/new_branch_to_show (was 5b25fe62).
  edd@rob:~/git/rcpp(master)$

There are few tutorials out there about how to do it, I once got mine from
Karthik when we did a Software Carpentry workshop.  Happy to detail off-list,
it adds less than 10 lines to ~/.bashrc.

Dirk

|
| Duncan Murdoch
|
| > Iñaki
| >
| >
| >
| > 2018-01-25 0:17 GMT+01:00 Duncan Murdoch :
| >> Lately I've been doing some work with the manipulateWidget package, which
| >> lives on Github at
| >> 

Re: [Rd] Are r2dtable and C_r2dtable behaving correctly?

2017-08-25 Thread Jari Oksanen
It is not about "really arge total number of observations", but:

set.seed(4711);tabs <- r2dtable(1e6, c(2, 2), c(2, 2)); A11 <- vapply(tabs, 
function(x) x[1, 1], numeric(1));table(A11)

A11
 0  1  2 
166483 666853 14 

There are three possible matrices, and these come out in proportions 1:4:1, the 
one with all cells filled with ones being
most common.

Cheers, Jari O.

From: R-devel  on behalf of Martin Maechler 

Sent: 25 August 2017 11:30
To: Gustavo Fernandez Bayon
Cc: r-devel@r-project.org
Subject: Re: [Rd] Are r2dtable and C_r2dtable behaving correctly?

> Gustavo Fernandez Bayon 
> on Thu, 24 Aug 2017 16:42:36 +0200 writes:

> Hello,
> While doing some enrichment tests using chisq.test() with simulated
> p-values, I noticed some strange behaviour. The computed p-value was
> extremely small, so I decided to dig a little deeper and debug
> chisq.test(). I noticed then that the simulated statistics returned by the
> following call

> tmp <- .Call(C_chisq_sim, sr, sc, B, E)

> were all the same, very small numbers. This, at first, seemed strange to
> me. So I decided to do some simulations myself, and started playing around
> with the r2dtable() function. Problem is, using my row and column
> marginals, r2dtable() always returns the same matrix. Let's provide a
> minimal example:

> rr <- c(209410, 276167)
> cc <- c(25000, 460577)
> ms <- r2dtable(3, rr, cc)

> I have tested this code in two machines and it always returned the same
> list of length three containing the same matrix three times. The repeated
> matrix is the following:

> [[1]]
> [,1]   [,2]
> [1,] 10782 198628
> [2,] 14218 261949

> [[2]]
> [,1]   [,2]
> [1,] 10782 198628
> [2,] 14218 261949

> [[3]]
> [,1]   [,2]
> [1,] 10782 198628
> [2,] 14218 261949

Yes.  You can also do

   unique(r2dtable(100, rr, cc))

and see that the result is constant.

I'm pretty sure this is still due to some integer overflow,

in spite of the fact that I had spent quite some time to fix
such problem in Dec 2003, see the 14 years old bug PR#5701
  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=5701#c2

It has to be said that this is based on an algorithm published
in 1981, specifically - from  help(r2dtable) -

 Patefield, W. M. (1981) Algorithm AS159.  An efficient method of
 generating r x c tables with given row and column totals.
 _Applied Statistics_ *30*, 91-97.

   For those with JSTOR access (typically via your University),
   available at http://www.jstor.org/stable/2346669

When I start reading it, indeed the algorithm seems start from the
expected value of a cell entry and then "explore from there"...
and I do wonder if there is not a flaw somewhere in the
algorithm:

I've now found that a bit more than a year ago, 'paljenczy' found on SO
  
https://stackoverflow.com/questions/37309276/r-r2dtable-contingency-tables-are-too-concentrated
that indeed the generated tables seem to be too much around the mean.
Basically his example:

https://stackoverflow.com/questions/37309276/r-r2dtable-contingency-tables-are-too-concentrated


> set.seed(1); system.time(tabs <- r2dtable(1e6, c(100, 100), c(100, 100))); 
> A11 <- vapply(tabs, function(x) x[1, 1], numeric(1))
   user  system elapsed
  0.218   0.025   0.244
> table(A11)

34 35 36 37 38 39 40 41 42 43
 2 17 40129334883   2026   4522   8766  15786
44 45 46 47 48 49 50 51 52 53
 26850  42142  59535  78851  96217 107686 112438 108237  95761  78737
54 55 56 57 58 59 60 61 62 63
 59732  41474  26939  16006   8827   4633   2050865340116
64 65 66 67
38 13  7  1
>

For a  2x2  table, there's really only one degree of freedom,
hence the above characterizes the full distribution for that
case.

I would have expected to see all possible values in  0:100
instead of such a "normal like" distribution with carrier only
in [34, 67].

There are newer publications and maybe algorithms.
So maybe the algorithm is "flawed by design" for really large
total number of observations, rather than wrong
Seems interesting ...

Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] registering Fortran routines in R packages

2017-05-10 Thread Jari Oksanen
Have you tried using tools:::package_native_routine_registration_skeleton()? If 
you don't like its output, you can easily edit its results and still avoid most 
pitfalls.

Cheers, Jari Oksanen

From: R-devel <r-devel-boun...@r-project.org> on behalf of Berend Hasselman 
<b...@xs4all.nl>
Sent: 10 May 2017 09:48
To: Christophe Dutang
Cc: r-devel@r-project.org
Subject: Re: [Rd] registering Fortran routines in R packages

Christophe,

> On 10 May 2017, at 08:08, Christophe Dutang <duta...@gmail.com> wrote:
>
> Thanks for your email.
>
> I try to change the name in lowercase but it conflicts with a C 
> implementation also named halton. So I rename the C function halton2() and 
> sobol2() while the Fortran function are HALTON() and SOBOL() (I also try 
> lower case in the Fortran code). Unfortunately, it does not help since I get
>
> init.c:97:25: error: use of undeclared identifier 'halton_'; did you mean 
> 'halton2'?
>   {"halton", (DL_FUNC) _SUB(halton),  7},
>
> My current solution is to comment FortEntries array and use 
> R_useDynamicSymbols(dll, TRUE) for a dynamic search of Fortran routines.

Have a look at my package geigen and its init.c.
Could it be that you are missing extern declarations for the Fortran routines?


Berend

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Some "lm" methods give wrong results when applied to "mlm" objects

2017-04-04 Thread Jari Oksanen
I had a look at some influence measures, and it seems to me that currently 
several methods handle multiple lm (mlm) objects wrongly in R. In some cases 
there are separate "mlm" methods, but usually "mlm" objects are handled by the 
same methods as univariate "lm" methods, and in some cases this fails.

There are two general patterns of problems in influence measures:

1) The univariate methods assume that overall standard deviation (sd) is of 
length one, but for "mlm" models we have a multivariate response with a 
multicolumn residual matrix. The functions also get correctly the sd vector 
corresponding to the columns, but it is not applied to these, but recycled for 
rows. This influences rstandard.lm and cooks.distance.lm. For instance, in 
cooks.distance.lm we have ((res/(sd * (1 - hat)))^2 * hat)/p, where res is a n 
x m matrix, sd is a m-vector and hat is a n-vector).  Both of these functions 
are very easily fixed.

2) Another problem is that several functions are based on lm.influence 
function, and it seems that it returns elements sigma and coefficients that are 
only based on the first variable (first column of the residual matrix wt.res) 
and give wrong results for other variables. This will influence functions 
dfbeta.lm (coefficients), dfbetas.lm (coefficients, sigma), dffits (sigma), 
rstudent.lm (sigma) and covratio (sigma). lm.influence finds these elements in 
compiled code and this is harder to fix. MASS (the book & the package) avoid 
using compiled code in their (univariate) studentized residuals, and instead 
use a clever short-cut.

In addition to these, there are a couple of other cases which seem to fail with 
"mlm" models: 

confint.lm gives empty result, because the length of the results is defined by 
names(coef(object)) which is NULL because "mlm" objects return a matrix of 
coefficients instead of a vector with names.

dummy.coef fails because "mlm" objects do not have xlevels item.

extractAIC.lm returns only one value instead of a vector, and edf is 
misleading. Separate deviance.mlm returns a vector of deviances, and logLik.lm 
returns "'logLik.lm' does not support multiple responses". Probably 
extractAIC.lm should work like logLik.lm.

Several methods already handle "mlm" methods by returning message "  is not 
yet implemented for multivariate lm()" which of course is a natural and correct 
solution to the problems.

Cheers, Jari Oksanen
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] typo or stale info in qr man

2016-10-25 Thread Jari Oksanen
And that missing functionality is that Linpack/Lapack routines do not return 
rank and have a different style of pivoting? For other aspects, the 
user-interface is very similar in dqrdc2 in R and in dqrdc in Linpack. Another 
difference seems to be that the final pivoting reported to the user is 
different: R keeps the original order except for aliased variables, but Linpack 
makes either wild shuffling or no pivoting at all. I haven't looked at dqpq3 in 
Lapack, but it appears to return no rank either (don't know about shuffling the 
columns). It seems that using Linpack dqrdc directly is not always compatible 
with dqrdc2 of R although it returns similar objects. That is, when packing up 
the Linpack function to produce an object with same items as qr.default (qr, 
rank, qraux, pivot, class "qr"), the result object may not yield similar 
results in base::qr.fitted, base::qr.resid etc as base::qr.default result (but 
I haven't had time for thorough testing).

This is how I tried to do the packing (apologies for clumsy coding):

SEXP do_QR(SEXP x, SEXP dopivot)
{
/* set up */
int i;
int nr = nrows(x), nx = ncols(x);
int pivoting = asInteger(dopivot);
SEXP qraux = PROTECT(allocVector(REALSXP, nx));
SEXP pivot = PROTECT(allocVector(INTSXP, nx));
/* do pivoting or keep the order of columns? */
if (pivoting)
memset(INTEGER(pivot), 0, nx * sizeof(int));
else
for(i = 0; i < nx; i++)
INTEGER(pivot)[i] = i+1;
double *work = (double *) R_alloc(nx, sizeof(double));
int job = 1;
x = PROTECT(duplicate(x));

/* QR decomposition with Linpack */
F77_CALL(dqrdc)(REAL(x), , , , REAL(qraux),
INTEGER(pivot), work, );

/* pack up */
SEXP qr = PROTECT(allocVector(VECSXP, 4));
SEXP labs = PROTECT(allocVector(STRSXP, 4));
SET_STRING_ELT(labs, 0, mkChar("qr"));
SET_STRING_ELT(labs, 1, mkChar("rank"));
SET_STRING_ELT(labs, 2, mkChar("qraux"));
SET_STRING_ELT(labs, 3, mkChar("pivot"));
setAttrib(qr, R_NamesSymbol, labs);
SEXP cl = PROTECT(allocVector(STRSXP, 1));
SET_STRING_ELT(cl, 0, mkChar("qr"));
classgets(qr, cl);
UNPROTECT(2); /* cl, labs */
SET_VECTOR_ELT(qr, 0, x);
SET_VECTOR_ELT(qr, 1, ScalarInteger(nx)); /* not really the rank,
 but no. of columns */
SET_VECTOR_ELT(qr, 2, qraux);
SET_VECTOR_ELT(qr, 3, pivot);
UNPROTECT(4); /* qr, x, pivot, qraux */
return qr;
}


cheers, Jari Oksanen

From: R-devel <r-devel-boun...@r-project.org> on behalf of Martin Maechler 
<maech...@stat.math.ethz.ch>
Sent: 25 October 2016 11:08
To: Wojciech Musial (Voitek)
Cc: R-devel@r-project.org
Subject: Re: [Rd] typo or stale info in qr man

>>>>> Wojciech Musial (Voitek) <wojciech.mus...@gmail.com>
>>>>> on Mon, 24 Oct 2016 15:07:55 -0700 writes:

> man for `qr` says that the function uses LINPACK's DQRDC, while it in
> fact uses DQRDC2.

which is a modification of LINPACK's DQRDC.

But you are right, and I have added to the help file (and a tiny
bit to the comments in the Fortran source).

When this change was done > 20 years ago, it was still hoped
that the numerical linear algebra community or more specifically
those behind LAPACK would eventually provide this functionality
with LAPACK (and we would then use that),
but that has never happened according to my knowledge.

Thank you for the 'heads up'.

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] summary( prcomp(*, tol = .) ) -- and 'rank.'

2016-03-25 Thread Jari Oksanen

> On 25 Mar 2016, at 11:45 am, peter dalgaard <pda...@gmail.com> wrote:
> 
>> 
>> On 25 Mar 2016, at 10:08 , Jari Oksanen <jari.oksa...@oulu.fi> wrote:
>> 
>>> 
>>> On 25 Mar 2016, at 10:41 am, peter dalgaard <pda...@gmail.com> wrote:
>>> 
>>> As I see it, the display showing the first p << n PCs adding up to 100% of 
>>> the variance is plainly wrong. 
>>> 
>>> I suspect it comes about via a mental short-circuit: If we try to control p 
>>> using a tolerance, then that amounts to saying that the remaining PCs are 
>>> effectively zero-variance, but that is (usually) not the intention at all. 
>>> 
>>> The common case is that the remainder terms have a roughly _constant_, 
>>> small-ish variance and are interpreted as noise. Of course the magnitude of 
>>> the noise is important information.  
>>> 
>> But then you should use Factor Analysis which has that concept of “noise” 
>> (unlike PCA).
> 
> Actually, FA has a slightly different concept of noise. PCA can be 
> interpreted as a purely technical operation, but also as an FA variant with 
> same variance for all components.
> 
> Specifically, FA is 
> 
> Sigma = LL' + Psi
> 
> with Psi a diagonal matrix. If Psi = sigma^2 I , then L can be determined (up 
> to rotation) as the first p components of PCA. (This is used in ML algorithms 
> for FA since it allows you to concentrate the likelihood to be a function of 
> Psi.)
> 
If I remember correctly, we took a correlation matrix and replaced the diagonal 
elements with variable “communalities” < 1 estimated by some trick, and then 
chunked that matrix into PCA and called the result FA. A more advanced way was 
to do this iteratively: take some first axes of PCA/FA, calculate diagonal 
elements from them & re-feed them into PCA. It was done like that because 
algorithms & computers were not strong enough for real FA. Now they are, and I 
think it would be better to treat PCA like PCA, at least in the default output 
of standard stats::summary function. So summary should show proportion of total 
variance (for people who think this is a cool thing to know) instead of showing 
a proportion of an unspecified part of the variance.

Cheers, Jari Oksanen (who now switches to listening to today’s Passion instead 
of continuing with PCA)


> Methods like PC regression are not being very specific about the model, but 
> the underlying line of thought is that PCs with small variances are 
> "uninformative", so that you can make do with only the first handful 
> regressors. I tend to interpret "uninformative" as "noise-like" in these 
> contexts.
> 
> -pd
> 
>> 
>> Cheers, Jari Oksanen
>> 
>>>> On 25 Mar 2016, at 00:02 , Steve Bronder <sbron...@stevebronder.com> wrote:
>>>> 
>>>> I agree with Kasper, this is a 'big' issue. Does your method of taking only
>>>> n PCs reduce the load on memory?
>>>> 
>>>> The new addition to the summary looks like a good idea, but Proportion of
>>>> Variance as you describe it may be confusing to new users. Am I correct in
>>>> saying Proportion of variance describes the amount of variance with respect
>>>> to the number of components the user chooses to show? So if I only choose
>>>> one I will explain 100% of the variance? I think showing 'Total Proportion
>>>> of Variance' is important if that is the case.
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> Steve Bronder
>>>> Website: stevebronder.com
>>>> Phone: 412-719-1282
>>>> Email: sbron...@stevebronder.com
>>>> 
>>>> 
>>>> On Thu, Mar 24, 2016 at 2:58 PM, Kasper Daniel Hansen <
>>>> kasperdanielhan...@gmail.com> wrote:
>>>> 
>>>>> Martin, I fully agree.  This becomes an issue when you have big matrices.
>>>>> 
>>>>> (Note that there are awesome methods for actually only computing a small
>>>>> number of PCs (unlike your code which uses svn which gets all of them);
>>>>> these are available in various CRAN packages).
>>>>> 
>>>>> Best,
>>>>> Kasper
>>>>> 
>>>>> On Thu, Mar 24, 2016 at 1:09 PM, Martin Maechler <
>>>>> maech...@stat.math.ethz.ch
>>>>>> wrote:
>>>>> 
>>>>>> Following from the R-help thread of March 22 on "Memory usage in prcomp",
>>>>>> 
>>>>>> I've starte

Re: [Rd] summary( prcomp(*, tol = .) ) -- and 'rank.'

2016-03-25 Thread Jari Oksanen

> On 25 Mar 2016, at 10:41 am, peter dalgaard <pda...@gmail.com> wrote:
> 
> As I see it, the display showing the first p << n PCs adding up to 100% of 
> the variance is plainly wrong. 
> 
> I suspect it comes about via a mental short-circuit: If we try to control p 
> using a tolerance, then that amounts to saying that the remaining PCs are 
> effectively zero-variance, but that is (usually) not the intention at all. 
> 
> The common case is that the remainder terms have a roughly _constant_, 
> small-ish variance and are interpreted as noise. Of course the magnitude of 
> the noise is important information.  
> 
But then you should use Factor Analysis which has that concept of “noise” 
(unlike PCA).

Cheers, Jari Oksanen

>> On 25 Mar 2016, at 00:02 , Steve Bronder <sbron...@stevebronder.com> wrote:
>> 
>> I agree with Kasper, this is a 'big' issue. Does your method of taking only
>> n PCs reduce the load on memory?
>> 
>> The new addition to the summary looks like a good idea, but Proportion of
>> Variance as you describe it may be confusing to new users. Am I correct in
>> saying Proportion of variance describes the amount of variance with respect
>> to the number of components the user chooses to show? So if I only choose
>> one I will explain 100% of the variance? I think showing 'Total Proportion
>> of Variance' is important if that is the case.
>> 
>> 
>> Regards,
>> 
>> Steve Bronder
>> Website: stevebronder.com
>> Phone: 412-719-1282
>> Email: sbron...@stevebronder.com
>> 
>> 
>> On Thu, Mar 24, 2016 at 2:58 PM, Kasper Daniel Hansen <
>> kasperdanielhan...@gmail.com> wrote:
>> 
>>> Martin, I fully agree.  This becomes an issue when you have big matrices.
>>> 
>>> (Note that there are awesome methods for actually only computing a small
>>> number of PCs (unlike your code which uses svn which gets all of them);
>>> these are available in various CRAN packages).
>>> 
>>> Best,
>>> Kasper
>>> 
>>> On Thu, Mar 24, 2016 at 1:09 PM, Martin Maechler <
>>> maech...@stat.math.ethz.ch
>>>> wrote:
>>> 
>>>> Following from the R-help thread of March 22 on "Memory usage in prcomp",
>>>> 
>>>> I've started looking into adding an optional   'rank.'  argument
>>>> to prcomp  allowing to more efficiently get only a few PCs
>>>> instead of the full p PCs, say when p = 1000 and you know you
>>>> only want 5 PCs.
>>>> 
>>>> (https://stat.ethz.ch/pipermail/r-help/2016-March/437228.html
>>>> 
>>>> As it was mentioned, we already have an optional 'tol' argument
>>>> which allows *not* to choose all PCs.
>>>> 
>>>> When I do that,
>>>> say
>>>> 
>>>>C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root
>>>>all.equal(S, crossprod(C))
>>>>set.seed(17)
>>>>X <- matrix(rnorm(32000), 1000, 32)
>>>>Z <- X %*% C  ## ==>  cov(Z) ~=  C'C = S
>>>>all.equal(cov(Z), S, tol = 0.08)
>>>>pZ <- prcomp(Z, tol = 0.1)
>>>>summary(pZ) # only ~14 PCs (out of 32)
>>>> 
>>>> I get for the last line, the   summary.prcomp(.) call :
>>>> 
>>>>> summary(pZ) # only ~14 PCs (out of 32)
>>>> Importance of components:
>>>> PC1PC2PC3PC4 PC5 PC6
>>>> PC7 PC8
>>>> Standard deviation 3.6415 2.7178 1.8447 1.3943 1.10207 0.90922
>>> 0.76951
>>>> 0.67490
>>>> Proportion of Variance 0.4352 0.2424 0.1117 0.0638 0.03986 0.02713
>>> 0.01943
>>>> 0.01495
>>>> Cumulative Proportion  0.4352 0.6775 0.7892 0.8530 0.89288 0.92001
>>> 0.93944
>>>> 0.95439
>>>>  PC9PC10PC11PC12PC13   PC14
>>>> Standard deviation 0.60833 0.51638 0.49048 0.44452 0.40326 0.3904
>>>> Proportion of Variance 0.01214 0.00875 0.00789 0.00648 0.00534 0.0050
>>>> Cumulative Proportion  0.96653 0.97528 0.98318 0.98966 0.99500 1.
>>>>> 
>>>> 
>>>> which computes the *proportions* as if there were only 14 PCs in
>>>> total (but there were 32 originally).
>>>> 
>>>> I would think that the summary should  or could in addition show
>>>> the usual  "proportion of variance explained"  like resu

Re: [Rd] Source code of early S versions

2016-02-29 Thread Jari Oksanen

> On 29 Feb 2016, at 20:54 pm, Barry Rowlingson <b.rowling...@lancaster.ac.uk> 
> wrote:
> 
> On Mon, Feb 29, 2016 at 6:17 PM, John Chambers <j...@r-project.org> wrote:
>> The Wikipedia statement may be a bit misleading.
>> 
>> S was never open source.  Source versions would only have been available 
>> with a nondisclosure agreement, and relatively few copies would have been 
>> distributed in source.  There was a small but valuable "beta test" network, 
>> mainly university statistics departments.
> 
> So it was free (or at least distribution cost only), but with a
> nondisclosure agreement? Did binaries circulate freely, legally or
> otherwise? Okay, guess I'll read the book.
> 
I don’t think I have seen S source, but some other Bell software has license of 
this type:

C THIS INFORMATION IS PROPRIETARY AND IS THE
 
C PROPERTY OF BELL TELEPHONE LABORATORIES,  
 
C INCORPORATED.  ITS REPRODUCTION OR DISCLOSURE 
 
C TO OTHERS, EITHER ORALLY OR IN WRITING, IS
 
C PROHIBITED WITHOUT WRITTEN PRERMISSION OF 
 
C BELL LABORATORIES. 

C IT IS UNDERSTOOD THAT THESE MATERIALS WILL BE USED FOR
 
C EDUCATIONAL AND INSTRUCTIONAL PURPOSES ONLY.

(Obviously in FORTRAN)

So the code was “open” in the sense that you could see the code, and it had to 
be “open", because source code  was the only way to distribute software before 
the era of widespread platforms allowing binary distributions (such as VAX/VMS 
or Intel/MS-DOS). However, the license in effect says that although you can see 
the code, you are not even allowed to tell anybody that you have seen it. I 
don’t know how this is interpreted currently, but you may ask the current 
owner, Nokia.

Cheers, Jari Oksanen
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] building with tcltk on Ubuntu 14.04

2015-05-28 Thread Jari Oksanen

On 28/05/2015, at 11:57 AM, Martin Maechler wrote:

 Ben Bolker bbol...@gmail.com
on Tue, 26 May 2015 11:13:41 -0400 writes:
 
 
 False alarm.  Completely wiping out my build directory followed by
 
 ../R-devel/configure --with-tcl-config=/usr/lib/tclConfig.sh
 - --with-tk-config=/usr/lib/tkConfig.sh; make
 
 seems to work.  (My fault for assuming repeated cycles of
 ./configure; make would actually do the right thing ...)
 
 There seems to be a corollary of Clarke's Law (any sufficient
 advanced technology is indistinguishable from magic) that says that
 any sufficiently complex software system may *not* be magic, but it's
 just easier to treat it as though it is ...
 
 Thanks for the offer of help ...
 
 I also run several computers on Ubuntu 14.04
 and never had to anything special, I mean *no*  
 --with-tcl-...  or --with-tk-
 where ever needed for me on 14.04 or earlier Ubuntu's... so I do
 wonder how you got into problems at all.
 
I also have the same problem with Ubuntu (at least in 14.04, now in 15.04): 
./configure does not find tcl/tk without --with-tcl-… and --with-tk-… 

They are in quite normal places, but still need manual setting. Currently I use 
something like --with-tcl-config=/usr/lib/tclConfig.sh 
--with-tk-config=/usr/lib/tkConfig.sh

I need these explicit switches only when configure is overwritten. Normal 
compilation with ./configure works OK and finds Tcl/Tk, but a couple of times 
per year the configure seems to change so much that I need to use these 
switches. I have had this problem a couple of years. 

If I need to guess, I do something wrong and against instructions, and 
therefore I won't complain.

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vegan moved to GitHub and vegan 2.2-0 is coming (are you ready?)

2014-09-13 Thread Jari Oksanen
Dear R-Devels,

My apologies for using a wrong list. Please ignore my messages. I would undo 
this if I only could, but what's done can't be undone (not the the first time 
in my life when I've learnt this).

Cheers, Jari Oksanen
On 13/09/2014, at 08:13 AM, Jari Oksanen wrote:

 Dear vegan team,
 
 Vegan development happens now completely in github. R-Forge repository is no 
 more in sync with github. I tried to commit all github changes to R-Forge, 
 but a week ago I got a conflict in file and I haven't had time to resolve 
 that conflict. You can follow vegan development and vegan discussion also 
 without signing to github. The system seems to be completely open and does 
 not require any user credentials (and this is good to remember when we 
 discuss about things). The developer front page is
 
 https://github.com/vegandevs
 
 We have prepared for the vegan 2.2-0 release. This would be a major release 
 that would close the gap between current development version and CRAN. I 
 haven't set any firm date for the release, but I think R 3.2.0 will be out in 
 October, and we should try to be concurrent with that -- in particular as the 
 2.0-10 in CRAN will give warnings in R check for that version.
 
 We have now solved a couple of major issues.
 
 - on technical side: the next R release will spit out tens of warnings for 
 namespace issues (visibility of functions). These were solved in 
 https://github.com/vegandevs/vegan/pull/28
 
 - all vegan functions doing permutation are now based on Gav's permute 
 package. This means that they can use any constrained permutation scheme of 
 the permute package. This also concerns functions that earlier only had 
 simple permutation. Of course, you do not need to use fancy permutation 
 schemes, but the default is still simple permutation and this can be 
 expressed by giving just the number of permutations on the command line. The 
 functions using the new permutation scheme are adonis, anosim, anova.cca for 
 CCA/RDA/dbRDA and hence also for ordistep etc., CCorA, envfit, mantel  
 mantel.partial, mrpp, mso, permutest.betadisper, permutest.cca, protest and 
 simper. The change for functions is now complete, but same clean up and 
 updating of documentation is still to be done. This is discussed in 
 https://github.com/vegandevs/vegan/issues/31
 
 - vegan 2.2-0 will also use parallel processing in several functions. This 
 was already done in several functions in vegan development. The discussion on 
 extending parallel processing to other functions was just opened in 
 https://github.com/vegandevs/vegan/issues/36 . Currently the following 
 functions can use parallel processing: adonis, anosim, anova.cca, mantel, 
 mantel.partial, mrpp and simper can use it permutations, bioenv can asses 
 several competing models in parallel, metaMDS can launch several random 
 starts in parallel and oecosimu can use parallel processing in evaluating the 
 statistics for null communities. If you compare this to the previous list of 
 permutation functions, you see that the following permutation methods do not 
 use parallel procesing: CCorA, envfit, mso, permutest.betadisper and protest. 
 The question is if these also should be parallelized or can we leave them 
 like they are, at least for the next release.
 
 - A more controversial issue is that Gav suggested moving rgl-based functions 
 away from vegan to a separate package 
 (https://github.com/vegandevs/vegan/issues/29 ). The main reason was that rgl 
 can cause problems in several platforms and even prevent installing vegan. 
 Indeed, when I tested these functions, they crashed in this Mac laptop. We 
 have now a separate vegan3d package for these functions 
 https://github.com/vegandevs/vegan3d . In addition to ordirgl + friends, 
 rgl.isomap and rgl.renyiaccum it also has oriplot3d package. This package has 
 now the same functionality as these functions had in vegan, and our purpose 
 is to release that concurrently with vegan 2.2-0. I recently suggested to 
 remove these functions from vegan, but we haven't made that change yet so 
 that you can express your opinion on the move. See 
 https://github.com/vegandevs/vegan/pull/37
 
 There are some simpler and smaller things, but you can see those if you 
 follow github.
 
 I have now mainly worked with my private fork of vegan and pushed to vegan 
 upstream changes when they have looked more or less finished. At this stage, 
 I have made a pull request, and normally waited for possible comments. To 
 get a second opinion, I have usually waited that Gav has a look at the 
 functions and let him merge them to vegan. Sometimes there has been a long 
 discussion before merge and we have edited the functions before the merge 
 (e.g., https://github.com/vegandevs/vegan/pull/34 ). If changes are small and 
 isolated bug fixes, I have pushed them directly to the vegan upstream, 
 though. I have found this pretty good way of working in github.
 
 Cheers, Jari Oksanen

[Rd] vegan moved to GitHub and vegan 2.2-0 is coming (are you ready?)

2014-09-12 Thread Jari Oksanen
Dear vegan team,

Vegan development happens now completely in github. R-Forge repository is no 
more in sync with github. I tried to commit all github changes to R-Forge, but 
a week ago I got a conflict in file and I haven't had time to resolve that 
conflict. You can follow vegan development and vegan discussion also without 
signing to github. The system seems to be completely open and does not require 
any user credentials (and this is good to remember when we discuss about 
things). The developer front page is

https://github.com/vegandevs

We have prepared for the vegan 2.2-0 release. This would be a major release 
that would close the gap between current development version and CRAN. I 
haven't set any firm date for the release, but I think R 3.2.0 will be out in 
October, and we should try to be concurrent with that -- in particular as the 
2.0-10 in CRAN will give warnings in R check for that version.

We have now solved a couple of major issues.

- on technical side: the next R release will spit out tens of warnings for 
namespace issues (visibility of functions). These were solved in 
https://github.com/vegandevs/vegan/pull/28

- all vegan functions doing permutation are now based on Gav's permute package. 
This means that they can use any constrained permutation scheme of the permute 
package. This also concerns functions that earlier only had simple permutation. 
Of course, you do not need to use fancy permutation schemes, but the default is 
still simple permutation and this can be expressed by giving just the number of 
permutations on the command line. The functions using the new permutation 
scheme are adonis, anosim, anova.cca for CCA/RDA/dbRDA and hence also for 
ordistep etc., CCorA, envfit, mantel  mantel.partial, mrpp, mso, 
permutest.betadisper, permutest.cca, protest and simper. The change for 
functions is now complete, but same clean up and updating of documentation is 
still to be done. This is discussed in 
https://github.com/vegandevs/vegan/issues/31

- vegan 2.2-0 will also use parallel processing in several functions. This was 
already done in several functions in vegan development. The discussion on 
extending parallel processing to other functions was just opened in 
https://github.com/vegandevs/vegan/issues/36 . Currently the following 
functions can use parallel processing: adonis, anosim, anova.cca, mantel, 
mantel.partial, mrpp and simper can use it permutations, bioenv can asses 
several competing models in parallel, metaMDS can launch several random starts 
in parallel and oecosimu can use parallel processing in evaluating the 
statistics for null communities. If you compare this to the previous list of 
permutation functions, you see that the following permutation methods do not 
use parallel procesing: CCorA, envfit, mso, permutest.betadisper and protest. 
The question is if these also should be parallelized or can we leave them like 
they are, at least for the next release.

- A more controversial issue is that Gav suggested moving rgl-based functions 
away from vegan to a separate package 
(https://github.com/vegandevs/vegan/issues/29 ). The main reason was that rgl 
can cause problems in several platforms and even prevent installing vegan. 
Indeed, when I tested these functions, they crashed in this Mac laptop. We have 
now a separate vegan3d package for these functions 
https://github.com/vegandevs/vegan3d . In addition to ordirgl + friends, 
rgl.isomap and rgl.renyiaccum it also has oriplot3d package. This package has 
now the same functionality as these functions had in vegan, and our purpose is 
to release that concurrently with vegan 2.2-0. I recently suggested to remove 
these functions from vegan, but we haven't made that change yet so that you can 
express your opinion on the move. See https://github.com/vegandevs/vegan/pull/37

There are some simpler and smaller things, but you can see those if you follow 
github.

I have now mainly worked with my private fork of vegan and pushed to vegan 
upstream changes when they have looked more or less finished. At this stage, I 
have made a pull request, and normally waited for possible comments. To get a 
second opinion, I have usually waited that Gav has a look at the functions and 
let him merge them to vegan. Sometimes there has been a long discussion before 
merge and we have edited the functions before the merge (e.g., 
https://github.com/vegandevs/vegan/pull/34 ). If changes are small and isolated 
bug fixes, I have pushed them directly to the vegan upstream, though. I have 
found this pretty good way of working in github.

Cheers, Jari Oksanen
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] historical significance of Pr(Chisq) 2.2e-16

2014-05-07 Thread Jari Oksanen
See ?format.pval

cheers, jari oksanen

From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf 
of Michael Friendly [frien...@yorku.ca]
Sent: 07 May 2014 17:02
To: r-devel
Subject: [Rd] historical significance of Pr(Chisq)  2.2e-16

Where does the value 2.2e-16 come from in p-values for chisq tests such
as those
reported below?

  Anova(cm.mod2)
Analysis of Deviance Table (Type II tests)

Response: Freq
LR Chisq Df Pr(Chisq)
B 11026.2 1  2.2e-16 ***
W 7037.5 1  2.2e-16 ***
Age 886.6 8  2.2e-16 ***
B:W 3025.2 1  2.2e-16 ***
B:Age 1130.4 8  2.2e-16 ***
W:Age 332.9 8  2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.  Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] A case for freezing CRAN

2014-03-21 Thread Jari Oksanen
Freezing CRAN solves no problem of reproducibility. If you know the 
sessionInfo() or the version of R, the packages used and their versions, you 
can reproduce that set up. If you do not know, then you cannot. You can try 
guess: source code of old release versions of R and old packages are in CRAN 
archive, and these files have dates. So you can collect a snapshot of R and 
packages for a given date. This is not an ideal solution, but it is the same 
level of reproducibility that you get with strictly frozen CRAN. CRAN is no the 
sole source of packages, and even with strictly frozen CRAN the users may have 
used packages from other source. I am sure that if CRAN would be frozen (but I 
assume it happens the same day hell freezes), people would increasingly often 
use other package sources than CRAN. The choice is easy if the alternatives are 
to wait for the next year for the bug fix release, or do the analysis now and 
use package versions in R-Forge or github. Then you could not assume that 
frozen CRAN packages were used.

CRAN policy is not made in this mailing list, and CRAN maintainers are so 
silent that it hurts ears. However, I hope they won't freeze CRAN. 

Strict reproduction seems to be harder than I first imagined: ./configure  
make really failed for R 2.14.1 and older in my office desktop. To reproduce 
older analysis, I would also need to install older tool sets (I suspect 
gfortran and cairo libraries).

CRAN is one source of R packages, and certainly its policy does not suit all 
developers. There is no policy that suits all.  Frozen CRAN would suit some, 
but certainly would deter some others. 

There seems to a common sentiment here that the only reason anybody would use R 
older than 3.0.3 is to reproduce old results. My experience form the Real 
Life(™) is that many of us use computers that we do not own, but they are the 
property of our employer. This may mean that we are not allowed to install 
there any software or we have to pay, or the Department of project has to pay, 
to the computer administration for installing new versions of software (our 
case). This is often called security. Personally I avoid this by using Mac 
laptop and Linux desktop: these are not supported by the University computer 
administration and I can do what I please with these, but poor Windows users 
are stuck. Computer classes are also maintained by centralized computer 
administration. This January they had new R, but last year it was still two 
years old. However, users can install packages in their personal folders so 
that they can use current packages even with older R. Therefore I want to take 
care that the packages I maintain also run in older R. Therefore I also applaud 
the current CRAN policy where new versions of packages are backported to 
previous R release: Even if you are stuck with stale R, you need not be stuck 
with stale packages. Currently I cannot test with older R than 2.14.2, though, 
but I do that regularly and certainly before CRAN releases.  If somebody wants 
to prevent this, they can set their package to unnecessarily depend on the 
current version of R. I would regard this as antisocial, but nobody would ask 
what I think about this so it does not matter.

The development branch of my package is in R-Forge, and only bug fixes and 
(hopefully) non-breaking enhancements (isolated so that they do not influence 
other functions, safe so that API does not change or  format of the output does 
not change) are merged to the CRAN release branch. This policy was adopted 
because it fits the current CRAN policy, and probably would need to change if 
CRAN policy changes.

Cheers, Jari Oksanen
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] A case for freezing CRAN

2014-03-21 Thread Jari Oksanen

On 21/03/2014, at 10:40 AM, Rainer M Krug wrote:

 
 
 This is a long and (mainly) interesting discussion, which is fanning out
 in many different directions, and I think many are not that relevant to
 the OP's suggestion. 
 
 I see the advantages of having such a dynamic CRAN, but also of having a
 more stable CRAN. I prefer CRAN as it is now, but ion many cases a more
 stable CRAN might b an advantage. So having releases of CRAN might make
 sense. But then there is the archiving issue of CRAN.
 
 The suggestion was made to move the responsibility away from CRAN and
 the R infrastructure to the user / researcher to guarantee that the
 results can be re-run years later. It would be nice to have this build
 in CRAN, but let's stick at the scenario that the user should care for
 reproducability.

There are two different problems that alternate in the discussion: 
reproducibility and breakage of CRAN dependencies. Frozen CRAN could make 
*approximate* reproducibility easier to achieve, but real reproducibility needs 
stricter solutions. Actual sessionInfo() is minimal information, but 
re-building a spitting image of old environment may still be demanding (but in 
many cases this does not matter). 

Another problem is that CRAN is so volatile that new versions of packages break 
other packages or old scripts. Here the main problem is how package developers 
work. Freezing CRAN would not change that: if package maintainers release 
breaking code, that would be frozen. I think that most packages do not make 
distinction between development and release branches, and CRAN policy won't 
change that. 

I can sympathize with package maintainers having 150 reverse dependencies. My 
main package only has ~50, and it is sure that I won't test them all with new 
release. I sometimes tried, but I could not even get all those built because 
they had other dependencies on packages that failed. Even those that I could 
test failed to detect problems (in one case all examples were \dontrun and 
passed nicely tests). I only wish that if people *really* depend on my package, 
they test it against R-Forge version and alert me before CRAN releases, but 
that is not very likely (I guess many dependencies are not *really* necessary, 
but only concern marginal features of the package, but CRAN forces to declare 
those). 

Still a few words about reproducibility of scripts: this can be hardly achieved 
with good coverage, because many scripts are so very ad hoc. When I edit and 
review manuscripts for journals, I very often get Sweave or knitr scripts that 
just work, where just means just so and so. Often they do not work at 
all, because they had some undeclared private functionalities or stray files in 
the author workspace that did not travel with the Sweave document. I think 
these -- published scientific papers -- are the main field where the code 
really should be reproducible, but they often are the hardest to reproduce. 
Nothing CRAN people do can help with sloppy code scientists write for 
publications. You know, they are scientists -- not engineers. 

Cheers, Jari Oksanen
 
 Leaving the issue of compilation out, a package which is creating a
 custom installation of the R version which includes the source of the R
 version used and the sources of the packages in a on Linux compilable
 format, given that the relevant dependencies are installed, would be a
 huge step forward. 
 
 I know - compilation on Windows (and sometimes Mac) is a serious
 problem), but to archive *all* binaries and to re-compile all older
 versions of R and all packages would be an impossible task.
 
 Apart from that - doing your analysis in a Virtual Machine and then
 simply archiving this Virtual Machine, would also be an option, but only
 for the more tech savy users.
 
 In a nutshell: I think a package would be able to provide the solution
 for a local archiving to make it possible to re-run the simulation with
 the same tools at a later stage - although guarantees would not be
 possible.
 
 Cheers,
 
 Rainer
 -- 
 Rainer M. Krug
 email: Raineratkrugsdotde
 PGP: 0x0F52F982
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] A case for freezing CRAN

2014-03-20 Thread Jari Oksanen

On 20/03/2014, at 14:14 PM, S Ellison wrote:

 If we could all agree on a particular set
 of cran packages to be used with a certain release of R, then it doesn't 
 matter
 how the 'snapshotting' gets implemented.
 
 This is pretty much the sticking point, though. I see no practical way of 
 reaching that agreement without the kind of decision authority (and effort) 
 that Linux distro maintainers put in to the internal consistency of each 
 distribution.
 
 CRAN doesn't try to do that; it's just a place to access packages offered by 
 maintainers. 
 
 As a package maintainer, I think support for critical version dependencies in 
 the imports or dependency lists is a good idea that individual package 
 maintainers could relatively easily manage, but I think freezing CRAN as a 
 whole or adopting single release cycles for CRAN would be thoroughly 
 impractical.
 

I have a feeling that this discussion has floated between two different 
arguments in favour of freezing: discontent with package authors who break 
their packages within R release cycle, and ability to reproduce old results. In 
the beginning the first argument was more prominent, but now the discussion has 
drifted to reproducing old results. 

I cannot see how freezing CRAN would help with package authors who do not 
separate development and CRAN release branches but introduce broken code, or 
code that breaks other packages. Freezing a broken snapshot would only mean 
that the situation cannot be cured before next R release, and then new breakage 
could be introduced. Result would be dysfunctional CRAN. I think that quite a 
few of the package updates are bug fixes and minor enhancements. Further, I do 
think that these should be backported to previous versions of R: users of 
previous version of R should also benefit from bug fixes. This also is the 
current CRAN policy and I think this is a good policy. Personally, I try to 
keep my packages in such a condition that they will also work in previous 
versions of R so that people do not need to upgrade R to have bug fixes in 
packages. 

The policy is the same with Linux maintainers: they do not just build a 
consistent release, but maintain the release by providing bug fixes. In Linux 
distributions, end of life equals freezing, or not providing new versions of 
software.

Another issue is reproducing old analyses. This is a valuable thing, and 
sessionInfo and ability to get certain versions of package certainly are steps 
forward. It looks that guaranteed reproduction is a hard task, though. For 
instance, R 2.14.2 is the oldest version of R that I can build out of the box 
in my Linux desktop. I have earlier built older, even much older, R versions, 
but something has happened in my OS that crashes the build process. To 
reproduce an old analysis, I also should install an older version of my OS,  
then build old R and then get the old versions of packages. It is nice if the 
last step is made easier.

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cat with backspace and newline characters

2013-11-07 Thread Jari Oksanen

On 07/11/2013, at 09:35 AM, Renaud Gaujoux wrote:

 I agree that the handling of \b is not that strange, once one agrees
 on what \b actually means, i.e. go back one character and not
 delete previous character.
 The fact that R GUI on Mac and Windows interprets/renders it
 differently shows that normality and strangeness is quite relative
 though.
 
As a user DEC LA120 terminal I expect the following:

 cat(a\b^\n)
â


Everything else feels like a bug.

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Compatibility with R 2.15.x: Makefile for (non-Sweave) vignettes in vignettes/?

2013-10-13 Thread Jari Oksanen
Henrik,
On 14/10/2013, at 00:35 AM, Henrik Bengtsson wrote:

 In R 3.1.0 (~April 2014), support for vignettes in inst/doc/ will go
 away (and probably much sooner for CRAN submission), e.g.
 
 I've been sticking with inst/doc/ for backward compatible reasons so
 that I can use a fallback inst/doc/Makefile for building
 *non*-Sweave vignettes also under R 2.15.x.   AFAIK, it is not
 possible to put a Makefile under vignettes/, i.e. it is not possible
 to build non-Sweave vignette under vignettes/.
 

You can have Makefile in vignettes, and at the moment this even passes CRAN 
tests. You may also need to have a vignettes/.install_extras file to move the 
produced non-vignettes files to their final packaged location.

You still get warnings of unused, pointless and misleading files with R 2.15.3, 
because R 3.0.2 packaging process makes files that R 2.15.3 regards as 
pointless and misleading. The CRAN policy seems to be to ignore those warnings. 
R is not backward compatible with herself, and I don't see much that a package 
author could do to work around this (apart from forking the package).

Cheers, Jari O.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] version comparison puzzle

2013-10-03 Thread Jari Oksanen
Actually, Bob O'Hara had a blog post about this in August 2012:

http://occamstypewriter.org/boboh/2012/08/17/lme4_destined_to_become_stable_through_rounding/

The concluding chapter reads:

I have been worried that lme4 will never become stable, but this latest 
version mollifies me with the thought that the developers can’t go on forever, 
so eventually lme4 will become stable when the machine precision forces it to 
be rounded up to 1.0

Cheers, Jari Oksanen

From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf 
of Martyn Plummer [plumm...@iarc.fr]
Sent: 03 October 2013 11:15
To: Ben Bolker
Cc: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] version comparison puzzle

It's an underflow problem. When comparing versions, a.b.c is converted
first to the integer vector c(a,b,c) and then to the double precision
value

a + b/base + c/base^2

where base is 1 greater than the largest integer component of any of the
versions: i.e 99912 in this case.  The last term is then smaller
than the machine precision so you can't tell the difference between
1.0.4 and 1.0.5.

Martyn

On Wed, 2013-10-02 at 23:41 -0400, Ben Bolker wrote:
  Can anyone explain what I'm missing here?

 max(pp1 - package_version(c(0.9911.3,1.0.4,1.0.5)))
 ## [1] ‘1.0.4’

 max(pp2 - package_version(c(1.0.3,1.0.4,1.0.5)))
 ## [1] ‘1.0.5’

 I've looked at ?package_version , to no avail.

 Since max() goes to .Primitive(max)
 I'm having trouble figuring out where it goes from there:
 I **think** this is related to ?xtfrm , which goes to
 .encode_numeric_version, which is doing something I really
 don't understand (it's in base/R/version.R ...)

 .encode_numeric_version(pp1)
 ## [1] 1 1 1
 ## attr(,base)
 ## [1] 9912
 ## attr(,lens)
 ## [1] 3 3 3
 ## attr(,.classes)
 ## [1] package_version numeric_version

 .encode_numeric_version(pp2)
 ## [1] 1.08 1.11 1.138889
 ## attr(,base)
 ## [1] 6
 ## attr(,lens)
 ## [1] 3 3 3
 ## attr(,.classes)
 ## [1] package_version numeric_version

 sessionInfo()
 R Under development (unstable) (2013-09-09 r63889)
 Platform: i686-pc-linux-gnu (32-bit)

 [snip]

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 loaded via a namespace (and not attached):
 [1] compiler_3.1.0 tools_3.1.0

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] question on why Rigroup package moved to Archive on CRAN

2013-03-10 Thread Jari Oksanen
What we've got here is failure to communicate. Some men you just can't reach. 
So you get what we had here last week, which is the way he wants it. Well, he 
gets it. I don't like it any more than you men. (from Cool hand Luke -- but 
whose fault?)

Cheers, Jari Oksanen

On 10/03/2013, at 17:18 PM, Uwe Ligges wrote:

 I wonder why you do not ask on CRAN@...? List members here cannot know the 
 answer. And we typically do not discuss such matters in public.
 
 I wonder why you do not read the e-mail message you get from the CRAN team?
 
 Please see the message with subject line Registering .External entry points 
 you got on January 20. You never answered nor fixed the package, hence the 
 package has been archived.
 
 Best,
 Uwe Ligges
 
 
 
 
 On 10.03.2013 02:43, Kevin Hendricks wrote:
 Hi Dan,
 
 In case this catches anyone else ...
 
 FWIW, I found the issue ...  in my Rinit.c, my package uses the .External 
 call which actually takes one SEXP which points to a varargs-like list.
 
 Under 2.15.X and earlier, I thought the proper entry for an .External call 
 was as below since it only does take one pointer as an argument:
 
 #include Rigroup.h
 
 /* Automate using sed or something. */
 #if _MSC_VER = 1000
 __declspec(dllexport)
 #endif
 
static const R_ExternalMethodDef R_ExtDef[] = {
  {igroupFuns, (DL_FUNC)igroupFuns, 1},
  {NULL, NULL, 0},
};
 
 void R_init_Rigroup(DllInfo *info)
 {
  R_registerRoutines(info,NULL,NULL,NULL,R_ExtDef);
 }
 
 
 But now according to the latest online docs on building your own package it 
 says:
 
 For routines with a variable number of arguments invoked viathe .External 
 interface, one specifies -1 for the number of arguments which tells R not to 
 check the actual number passed. Note that the number of arguments passed to 
 .External are not currently checked but they will be in R 3.0.0.
 
 So I need to change my Rinit.c to change the 1 to a -1 and that error 
 should go away.
 
 Thanks again for all your help with this.  I will update my package and 
 resubmit it once version 3.0 gets released and I get a chance to verify that 
 this does in fact fix the problem.
 
 Kevin
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Keeping up to date with R-devel

2013-02-27 Thread Jari Oksanen

On 27/02/2013, at 18:08 PM, Dirk Eddelbuettel wrote:

 
 On 27 February 2013 at 17:16, Renaud wrote:
 | Hi,
 | 
 | thanks for the responses.
 | Dirk I found the script you posted once. Can anyone send me a link to the 
 | beaten to death post?
 
 Those were Simon's words, not mine, but I think he referred to the long-ish
 and painful thread here:
 
   http://thread.gmane.org/gmane.comp.lang.r.devel/32779
 
 Feel free to ignore the title, and most assertions by the OP which were never
 replicated by anybody else.  
 
 The do not build in src mantra was repeated a few times, and as I recall
 also refuted once (not by me).  That is not a topic I care much about; I use
 a shortcut, am aware of its (theoretical?) limits but for the casual R CMD
 check use I get out of R-devel never had an issue.
 
FWIW, I also build in src, at least twice weekly. It is a bit scary to confess 
this, but I'll duck and cover and I hope they will not catch me. This is a 
no-no, and if you run in the trouble, you shall not make noise, but you got to 
clean up your mess all by yourself. I even didn't know about distclean, but I 
do manual cleaning. When I run in the trouble, the message is usually that 
there is no rule to build 'x' from 'z'. So I go to the offending directory 
(folder for Windows users), check which files are not under version control 
(svn st), remove those, ./configure  make. It has worked so far. The day it 
won't work, I'll remove my old src and start from the square one  with a virgin 
checkout and following the instructions. This has not happened yet, and I have 
done this for several moths, over a year (I'm afraid that day of destruction is 
drawing nigh: this abomination must be be stopped). I only do this in my home 
directory in my office desktop, I don't make install, b!
 ut I have a symbolic link in ~/bin to the built binary in the build directory 
so that I can either use the stock R of my system (which still runs in 2.14 
series) with stable packages, or experimental R with experimental versions of 
packages. 

I think the rule is that you can do anything as long as you don't complain. If 
you want to complain, you must follow the instructions. 

Cheers, Jari Oksanen
--
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] It's a BUG or a Feature? Generating seq break comparing operators

2013-02-07 Thread Jari Oksanen
This should be FAQ 0.0. No other thing is asked as frequently as this. This is 
the FAQest of all FAQs, and a mother of all FAQs. At least this should be in R 
posting guide: Read FAQ 7.31 before posting!

Cheers, Jari Oksanen

On 07/02/2013, at 12:13 PM, R. Michael Weylandt wrote:

 R FAQ 7.31
 
 Cheers,
 MW
 
 On Thu, Feb 7, 2013 at 10:05 AM, Davide Rambaldi davide.ramba...@ieo.eu 
 wrote:
 Hello everybody:
 
 I get a strange behavior with seq, take a look at this:
 
 msd - seq(0.05,0.3, 0.01)
 msd[13]
 [1] 0.17
 class(msd)
 [1] numeric
 class(msd[13])
 [1] numeric
 typeof(msd[13])
 [1] double
 
 now the problem:
 
 msd[13] == 0.17
 [1] FALSE
 
 It is strange only to me?
 
 Consider that:
 
 0.17 == 0.17
 [1] TRUE
 
 and also
 
 a - c(0,1,0.17)
 a
 [1] 0.00 1.00 0.17
 a[3] == 0.17
 [1] TRUE
 
 It's a BUG in seq? I suspect something related to doubles …
 
 sessionInfo():
 
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 loaded via a namespace (and not attached):
 [1] tools_2.15.2
 
 
 ---
 PLEASE NOTE MY NEW EMAIL ADDRESS
 ---
 
 -
 Davide Rambaldi, PhD.
 -
 IEO ~ MolMed
 [e] davide.ramba...@ieo.eu
 [e] davide.ramba...@gmail.com
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check not reading R_LIBS from ~/.R/check.Renviron

2013-01-17 Thread Jari Oksanen
Gav,

This is off-list since I only wonder what you are trying to do. It seems to me 
that you're trying something much too elegant and complicated. One thing that I 
have learnt to avoid is muck up with environments: that is calling trouble. One 
day they will change R and you will get errors that are very difficult to track 
(I had that once with gnome: I did not edit the config file manually, but only 
used configuration GUI tools, but then gnome changed and I could not start X11 
after distro upgrade -- and it was difficult to track the reason).

What I do with R-devel and R release co-existence is that I keep them 
completely separate. I do  not install (make install) R-devel, but I leave it 
in its working directory. I have now a symbolic link in ~/bin ($HOME/bin):

cd ~/bin
ln -s ~/R-devel/bin/R R3

So when I want to run R 3.0.0 I use 'R3' and when I want to use stock R of my 
distro (no 2.15.1) use R. These are completely different beasts, and packages 
are installed separately for each so that they are 3.0.0 version in R3 and 2.15 
versions in R. I don't edit environments, but always use defaults.

With this setup, analogue checks up smoothly with only comment coming from 
examples:

* checking differences from ‘analogue-Ex.Rout’ to ‘analogue-Ex.Rout.save’ ...
2c2
 This is vegan 2.1-23
---
 This is vegan 2.0-5
1104d1103
 Warning: argument 'tol.dw' is not used (yet)
4894d4892
 Warning: argument 'tol.dw' is not used (yet)
4933d4930
 Warning: argument 'tol.dw' is not used (yet)
6139d6135
 Warning: argument 'tol.dw' is not used (yet)
7078d7073
 Warning: argument 'tol.dw' is not used (yet)
7116d7110
 Warning: argument 'tol.dw' is not used (yet)
 OK

Cheers, Jari



From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf 
of Gavin Simpson [gavin.simp...@ucl.ac.uk]
Sent: 16 January 2013 22:13
To: R Devel Mailing List
Subject: [Rd] R CMD check not reading R_LIBS from ~/.R/check.Renviron

Dear List,

Further to my earlier email, I note that, for me at least, R CMD check
is *not* reading R_LIBS from ~/.R/check.Renviron on R 2.15.2 patched
(r61228) and R Under Development (r61660). The only way I can get R CMD
check to look for packages in a user-supplied library is by explicitly
exporting R_LIBS set to the relevant directory.

R CMD build *does* read R_LIBS from ~/.R/build.Renviron for the same
versions of R on the same Fedora 16 laptop. So I am in the strange
situation of being able to build but not check a source package having
followed the instructions in Writing R Extensions.

I have tried exporting R_CHECK_ENVIRON via

export R_CHECK_ENVIRON=/home/gavin/.R/check.Renviron

and that doesn't work either.

~/.R/check.Renviron contains:

R_LIBS=/home/gavin/R/libs/
#R_LIBS=/home/gavin/R/devlibs/

Anyone suggest how/where I am going wrong?

More complete system info follows below.

TIA

Gavin

 sessionInfo()
R version 2.15.2 Patched (2012-12-05 r61228)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_GB.utf8LC_COLLATE=en_GB.utf8
 [5] LC_MONETARY=en_GB.utf8LC_MESSAGES=en_GB.utf8
 [7] LC_PAPER=CLC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base

 sessionInfo()
R Under development (unstable) (2013-01-16 r61660)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_GB.utf8LC_COLLATE=en_GB.utf8
 [5] LC_MONETARY=en_GB.utf8LC_MESSAGES=en_GB.utf8
 [7] LC_PAPER=CLC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] prcomp with previously scaled data: predict with 'newdata' wrong

2012-05-23 Thread Jari Oksanen
Hello folks,

it may be regarded as a user error to scale() your data prior to prcomp() 
instead of using its 'scale.' argument. However, it is a user thing that may 
happen and sounds a legitimate thing to do, but in that case predict() with 
'newdata' can give wrong results:

x - scale(USArrests)
sol - prcomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] Mean relative difference: 0.9033485

Predicting with the same data gives different results than the original PCA of 
the data.

The reason of this behaviour seems to be in these first lines of 
stats:::prcomp.default():

x - scale(x, center = center, scale = scale.)
cen - attr(x, scaled:center)
sc - attr(x, scaled:scale)

If input data 'x' have 'scaled:scale' attribute, it will be retained if scale() 
is called with argument scale = FALSE like is the case with default options 
in prcomp(). So scale(scale(x, scale = TRUE), scale = FALSE) will have the 
'scaled:center' of the outer scale() (i.e, numerical zero), but the 
'scaled:scale' of the inner scale(). 

Function princomp  finds the 'scale' directly instead of looking at the 
attributes of the input data, and works like expected:

 sol - princomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] TRUE

I don't have any nifty solution to this -- only checking the 'scale.' attribute 
and acting accordingly:

sc - if (scale.) attr(x, scaled:scale) else FALSE

Cheers, Jari Oksanen


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] prcomp with previously scaled data: predict with 'newdata' wrong

2012-05-23 Thread Jari Oksanen
To fix myself: the stupid solution I suggested won't work as 'scale.' need not 
be TRUE or FALSE, but it can be a vector of scales. The following looks like 
being able to handle this, but is not transparent nor elegant:

sc - if (isTRUE(scale.)) attr(x, scaled:scale) else scale.

I trust you find an elegant solution (if you think this is worth fixing).

Cheers, Jari Oksanen

PS. Sorry for the top posting: cannot help with the email system I have in my 
work desktop.

From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf 
of Jari Oksanen [jari.oksa...@oulu.fi]
Sent: 23 May 2012 13:51
To: r-de...@stat.math.ethz.ch
Subject: [Rd] prcomp with previously scaled data: predict with 'newdata'
wrong

Hello folks,

it may be regarded as a user error to scale() your data prior to prcomp() 
instead of using its 'scale.' argument. However, it is a user thing that may 
happen and sounds a legitimate thing to do, but in that case predict() with 
'newdata' can give wrong results:

x - scale(USArrests)
sol - prcomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] Mean relative difference: 0.9033485

Predicting with the same data gives different results than the original PCA of 
the data.

The reason of this behaviour seems to be in these first lines of 
stats:::prcomp.default():

x - scale(x, center = center, scale = scale.)
cen - attr(x, scaled:center)
sc - attr(x, scaled:scale)

If input data 'x' have 'scaled:scale' attribute, it will be retained if scale() 
is called with argument scale = FALSE like is the case with default options 
in prcomp(). So scale(scale(x, scale = TRUE), scale = FALSE) will have the 
'scaled:center' of the outer scale() (i.e, numerical zero), but the 
'scaled:scale' of the inner scale().

Function princomp  finds the 'scale' directly instead of looking at the 
attributes of the input data, and works like expected:

 sol - princomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] TRUE

I don't have any nifty solution to this -- only checking the 'scale.' attribute 
and acting accordingly:

sc - if (scale.) attr(x, scaled:scale) else FALSE

Cheers, Jari Oksanen


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] --as-cran and need to ignore.svn directories

2012-03-12 Thread Jari Oksanen

On 12/03/2012, at 18:03 PM, Paul Johnson wrote:

 Good morning:
 
 I submitted a package update to CRAN and got a bounce because I had
 not run R CMD check with --as-cran.  I'd not heard of that before,
 but I'm glad to know about it now.
 
 I see it warns when my functions do use partial argument matching, and
 I like that advice very much.
 
 Also I see this warning
 
 * checking package subdirectories ... WARNING
 Found the following directory(s) with names of version control directories:
  ./.svn
  ./R/.svn
  ./data/.svn
  ./inst/.svn
  ./inst/doc/.svn
  ./inst/examples/.svn
  ./vignettes/.svn
 These should not be in a package tarball.
 
 Is there a way to cause R to ignore the .svn folders while running R
 CMD check --as-cran or R CMD build?
 
 It seems a little tedious to have to copy the whole directory tree to
 some other place and remove the .svn folders before building. I can do
 it, but it just seems, well, tedious. I have the feeling that you
 frequent flyers  would have worked around this already.


Paul,

I think the best solution is to 'svn export' svn directory to a temporary 
directory/folder:

svn export my-svn-directory tmp-pkg-directory
R CMD build tmp-pkg-directory
R CMD check --as-cran ...

The two advantages of 'svn export' that it (1) strips the .svn specific files, 
and (2) it really exports only those files that really are under version 
control. More often than once I have had some non-svn files in my svn directory 
so that *my* version of the package works, but the one actually in subversion 
fails.

Cheers, Jari Oksanen 

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE: unstated dependencies in examples

2011-10-15 Thread Jari Oksanen
Hello folks,

To get it short, I cut out most of the spurious controversy, and go to the
key point (and it also helps to go to sauna and then sleep well all night):

On 14/10/11 22:30 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote:

 
 Also note that the package would be accepted on CRAN as is, if you
 declared parallel as a Suggests, as far as I understand Jari. At least
 binaries for Windows for old R versions will be built, since I am
 checking with
 _R_CHECK_FORCE_SUGGESTS_=FALSE
 on Windows.  Therefore, I believe (I haven't seen the package) this
 discussion is meaningless anyway.

This is fine and solve the problems I anticipated. I did not know about this
possibility. It was not shown in R CMD check --help, nor in the usual
manuals I read: it seems to be mentioned in R-ints.texi, but not in
R-exts.texi nor in R-admin.texi.

Although I feel well at the moment, I return to the old team: about the kind
of keyword describing packages that you don't necessarily need, and which
are used in style

if(require(foo)) {do_something_fancy_with_foo::foo()}

They are Sugar: parallel, foo. They are not necessarily needed, if you
don't have you don't necessarily even know you need them.

Then about old R and new packages: many of us are in situations where we
must use an old version of R. However, we can still install packages in
private libraries without admin privileges. They may not be system-wide, and
they can be wiped out in the next boot, or you may need to have them in your
USB stick, but installing a package is rather a light operation  which can
be be done except in most paranoid systems. One year I burned an R
installation to a CD that I distributed to the students so that they could
run R in a PC class with too ancient R. In one occasion I gave students
temporary usernames to my personal Linux desktop so that they could log in
to my desktop from the class for one analysis (but that is in general too
clumsy as Windows did not have good X11).

New package versions can contain bug fixed and some enhanced functionality
in addition to radical new features that require bleeding edge R.
Personally, I try to keep my release versions such that they work in
current, previous and future major versions of R. Currently I test the
package more or less regularly in R 2.13.2 and R-to-be-2.14.0 in MacOS, and
in 2.12.2 and R-to-be-2.15.0 in Linux, and I expect the release version to
pass all these. The development version can fail in older R, but then we
(the team) must judge if we merge such failing features to the release.

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE: unstated dependencies in examples

2011-10-14 Thread Jari Oksanen
On Thu, 2011-10-13 at 17:34 +0200, Uwe Ligges wrote:
 I looked at the code and since this is not that trivial to change, I 
 think we can well live with typing
 
 grep -r gplots ./man
 
 which is not too hard to run on the source package, I believe.
 
 Best wishes,
 Uwe
 
Uwe  others,

This is OK if you want to identify the cause of the problems. However,
the basic problem was that checking required something that is not
required: there was one example that was not run, and one case where the
loading of the package was not necessary (if(require(package))). I do
believe that handling this kind of cases is difficult in automatic
checking. However, I think they need not be checked: there should be a
new case of package reference in addition to 'depends', 'suggests' and
'enhances' -- something like 'benefitsfrom'.

This is now actual to me, since I'm adding 'parallel' support to my
package, but there seems to be no clean way of doing this with the
current checking procedures. I use the 'parallel' support only if the
package is available (in R = 2.14.0, not yet released), and there are
multiple cores. If there is only once cpu or there is not yet 
'parallel' package, nothing bad will happen: things will only work like
they worked earlier without 'parallel' package. I haven't found out how
to do this cleanly for R CMD check (it is clean for my code since there
the usage is checked). If I add suggests: parallel I get R CMD check
error for the current and previous R -- for no reason. So currently I
don't mention 'parallel' at all in DESCRIPTION: I get a NOTE and
Warnings ('require' call not declared, no visible definitions), but this
is a smaller  problem than having a spurious failure, and failing to
have this package for a system where it works quite normally.

The new DESCRIPTION keyword could be used for packages that are useful
but not necessary, so that the package can be quite well be used without
these packages, but it may have some extra options or functionality with
those packages. This sounds like a suggestion to me, but in R language
suggestions cannot be refused.

Cheers, jari oksanen

 
 On 13.10.2011 03:00, Yihui Xie wrote:
  You have this in Jevons.Rd:
 
  # show as balloonplots
 
  if (require(gplots)) {
 
 
  and this in Snow.Rd:
 
  %\dontrun{
 
  library(sp)
 
 
  It will certainly be helpful if R CMD check can provide more
  informative messages (in this case, e.g, point out the Rd files).
 
  Regards,
  Yihui
  --
  Yihui Xiexieyi...@gmail.com
  Phone: 515-294-2465 Web: http://yihui.name
  Department of Statistics, Iowa State University
  2215 Snedecor Hall, Ames, IA
 
 
 
  On Wed, Oct 12, 2011 at 11:33 AM, Michael Friendlyfrien...@yorku.ca  
  wrote:
  Using R 2.13.1, I am now getting the following NOTE when I run R CMD check
  on my HistData
  package
 
  * checking for unstated dependencies in examples ... NOTE
  'library' or 'require' calls not declared from:
gplots sp
 
  Under R 2.12.x, I didn't get these notes.
 
  I have ~ 25 .Rd files in this package, and AFAICS, every example uses
  library or require for the
  functions used;  the DESCRIPTION file has the long list of Suggests, which
  previously was sufficient
  for packages used in examples.
 
  Suggests: gtools, KernSmooth, maps, ggplot2, proto, grid, reshape, plyr,
  lattice, ReadImages, car
 
  But I have no way to find the .Rd file(s) that triggered this note.
 
What is the tool used in R CMD check to make this diagnosis?  It would be
  better
  if this reported the .Rd file(s) that triggered this note.
  Is it possible that this note could be specious?
 
  -Michael
 
  --
  Michael Friendly Email: friendly AT yorku DOT ca
  Professor, Psychology Dept.
  York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
  4700 Keele StreetWeb:   http://www.datavis.ca
  Toronto, ONT  M3J 1P3 CANADA
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE: unstated dependencies in examples

2011-10-14 Thread Jari Oksanen
On 14/10/11 16:26 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 14/10/2011 9:18 AM, Jari Oksanen wrote:

 
 Uwe  others,
 
 This is OK if you want to identify the cause of the problems. However,
 the basic problem was that checking required something that is not
 required: there was one example that was not run, and one case where the
 loading of the package was not necessary (if(require(package))). I do
 believe that handling this kind of cases is difficult in automatic
 checking. However, I think they need not be checked: there should be a
 new case of package reference in addition to 'depends', 'suggests' and
 'enhances' -- something like 'benefitsfrom'.
 
 Users use those declarations when they ask to install dependencies.  If
 you don't declare a dependence on a contributed package, users will have
 to manually install it.
 
Howdy,

This is a pretty weak argument in this particular case: 'parallel' is not a
contributed package so that you cannot install it. You either have it or you
don't have it. In latter case, nothing happens, but everything works like
usual. In the former case, you may have some new things.

(Having 'parallel' as a contributed package for R  2.14.0 would be a great
idea but not something I dare to suggest.)

 This is now actual to me, since I'm adding 'parallel' support to my
 package, but there seems to be no clean way of doing this with the
 current checking procedures. I use the 'parallel' support only if the
 package is available (in R= 2.14.0, not yet released), and there are
 multiple cores.
 
 Temporarily maintain two releases of your package:  one for R  2.14.0
 that doesn't mention parallel, and one for R = 2.14.0 that does.  The
 second one should declare its dependence on R = 2.14.0.  If support for
 parallel is your only change, you don't need to do anything for the
 previous one:  CRAN will not replace it in the 2.13.x repository if the
 new one needs a newer R.
 
Forking my package was indeed one of the three alternatives I have
considered. In this case forking sounds really weird: for R  2.13.0 both
forks would work identically. The only difference being how they are handled
by R checkers. 

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE: unstated dependencies in examples

2011-10-14 Thread Jari Oksanen
On 14/10/11 19:00 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote:

 
 
 On 14.10.2011 16:15, Duncan Murdoch wrote:
 On 14/10/2011 10:10 AM, Jari Oksanen wrote:
 On 14/10/11 16:26 PM, Duncan Murdochmurdoch.dun...@gmail.com wrote:
 
 On 14/10/2011 9:18 AM, Jari Oksanen wrote:
 
 
 Uwe others,
 
 This is OK if you want to identify the cause of the problems. However,
 the basic problem was that checking required something that is not
 required: there was one example that was not run, and one case
 where the
 loading of the package was not necessary (if(require(package))).
 I do
 believe that handling this kind of cases is difficult in automatic
 checking. However, I think they need not be checked: there should be a
 new case of package reference in addition to 'depends', 'suggests' and
 'enhances' -- something like 'benefitsfrom'.
 
 Users use those declarations when they ask to install dependencies. If
 you don't declare a dependence on a contributed package, users will
 have
 to manually install it.
 
 Howdy,
 
 This is a pretty weak argument in this particular case: 'parallel' is
 not a
 contributed package so that you cannot install it. You either have it
 or you
 don't have it. In latter case, nothing happens, but everything works like
 usual. In the former case, you may have some new things.
 
 (Having 'parallel' as a contributed package for R 2.14.0 would be a
 great
 idea but not something I dare to suggest.)
 
 This is now actual to me, since I'm adding 'parallel' support to my
 package, but there seems to be no clean way of doing this with the
 current checking procedures. I use the 'parallel' support only if the
 package is available (in R= 2.14.0, not yet released), and there are
 multiple cores.
 
 Temporarily maintain two releases of your package: one for R 2.14.0
 that doesn't mention parallel, and one for R= 2.14.0 that does. The
 second one should declare its dependence on R= 2.14.0. If support for
 parallel is your only change, you don't need to do anything for the
 previous one: CRAN will not replace it in the 2.13.x repository if the
 new one needs a newer R.
 
 Forking my package was indeed one of the three alternatives I have
 considered. In this case forking sounds really weird: for R 2.13.0 both
 forks would work identically. The only difference being how they are
 handled
 by R checkers.
 
 I don't see why it's weird to require that a version that uses a
 facility that is in 2.14.0 but no earlier versions should have to
 declare that. Sure, you can put all sorts of conditional tests into your
 code so that it avoids using the new facility in older versions, but
 isn't it simpler to just declare the dependency and avoid cluttering
 your code with those tests?
 
 Indeed, I think you should update your package and declare the
 dependency on R = 2.14.0. This seems to be a cleanest possible
 approach. Distributing a contributed parallel package without
 functionality for R  2.14.0 is not, why should anybody develop code for
 R versions that won't be supported any more in due course?

Here one reason: Our PC labs have now R version 2.12.something and it is not
in my power to upgrade R, but that depends on the will of our computing
centre. If it will upgraded, it will not be 2.14.something. A simple desire
to be able to use the package in the environment were I work sounds a valid
personal reason.

A second point is that the package would not *depend* or anything on R =
2.14.0. It could be faster in some cases, but not in all. It would just as
legitimate to have a condition, that the package cannot be used by those
poor sods who don't have but one processor (and I was one just a short time
ago). Indeed, this is exactly the same condition: you *must* have a hardware
I want you to have, and the version of R I want to have. I won't make that
requirement.

Like I wrote in my previous message, I had considered three choices. One was
forking, another was delaying the release of these features till 2.14.* is
old, and the third was to depend on 'snow' *and* 'multicore' instead of
'paralle'. Now the second choice sounds the best.

Cheers, Jari Oksanen

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Standardized Pearson residuals

2011-03-15 Thread Jari Oksanen
On 15/03/11 13:17 PM, peter dalgaard pda...@gmail.com wrote:

 
 On Mar 15, 2011, at 04:40 , Brett Presnell wrote:
 
 
 Background: I'm currently teaching an undergrad/grad-service course from
 Agresti's Introduction to Categorical Data Analysis (2nd edn) and
 deviance residuals are not used in the text.  For now I'll just provide
 the students with a simple function to use, but I prefer to use R's
 native capabilities whenever possible.
 
 Incidentally, chisq.test will have a stdres component in 2.13.0 for
 much the same reason.
 
 Thank you.  That's one more thing I won't have to provide code for
 anymore.  Coincidentally, Agresti mentioned this to me a week or two ago
 as something that he felt was missing, so that's at least two people who
 will be happy to see this added.
 
 
 And of course, I was teaching a course based on Agresti  Franklin:
 Statistics, The Art and Science of Learning from Data, when I realized that
 R was missing standardized residuals.
 
So nobody uses McCullagh  Nelder: Generalized Linear Models in teaching,
since they don't realize that R is missing Anscombe residuals, too?

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] postscript failure manifests in plot.TukeyHSD

2010-12-15 Thread Jari Oksanen
On 16/12/10 04:24 AM, Paul Murrell p.murr...@auckland.ac.nz wrote:

 Hi
 
 According to the PostScript Language Reference Manual and the PDF
 Reference, in both PDF and PostScript ...
 
 ... a line width of zero is valid, but not recommended (and is clearly
 not supported by some viewers).
 
 ... a line dash pattern cannot be specified as all zero lengths.
 (So, because R generates the line dash pattern proportional to the line
 width, a specification of lwd=0 and
 lty=anything-other-than-solid-or-none does not make sense.)
 
 I think three fixes are required:
 
 (i)  Enforce a minimum line width of 0.01 (mainly because that is not
 zero, but also because that is the smallest value greater than zero when
 you round to 2dp like the PDF and PostScript devices do and it's still
 REALLY thin).
 
 (ii) If the line dash pattern ends up as all zeroes (to 2dp), because
 the line width is so small (thin), force the dash pattern to solid
 instead.
 
 (iii) plot.TukeyHSD() should not use lwd=0  (0.5 is plenty difference to
 be obviously lighter than the main plot lines)
 
 I will commit these unless there are better suggestions or bitter
 objections.
 
Paul,

The difference between working previous (of R 2.11.1) and failing
current-still-yesterday (R 2.12.1 RC) was:

$ diff -U2 oldtukeyplot.ps /Volumes/TIKKU/tukeyplot.ps
--- oldtukeyplot.ps2010-12-14 12:06:07.0 +0200
+++ /Volumes/TIKKU/tukeyplot.ps2010-12-14 12:13:32.0 +0200
@@ -172,5 +172,5 @@
 0 setgray
 0.00 setlinewidth
-[ 3.00 5.00] 0 setdash
+[ 0.00 0.00] 0 setdash
 np
 660.06 91.44 m

So 0.00 setlinewidth worked, but [0.00 0.00] 0 setdash failed. Assuming
PostScript is anything like English, it is the all-zero dash that caused the
failure. 

Cheers, Jari Oksanen
 Paul
 
 On 15/12/2010 7:20 a.m., Ben Bolker wrote:
 On 10-12-14 01:16 PM, Peter Ehlers wrote:
 On 2010-12-14 09:27, Ben Bolker wrote:
 Jari Oksanenjari.oksanenat   oulu.fi   writes:
 
 
 Hello R Developers,
 
 Dear R-developers,
 
 I ran some standard tests with currently (today morning) compiled R
 release
 candidate in Linux R 2.12.1 RC (2010-12-13 r53843). Some of these
 tests used
 plot.TukeyHSD function. This worked OK on the screen (X11 device), but
 PostScript file could not be rendered. The following example had the
 problem
 with me:
 
 postscript(file=tukeyplot.ps)
 example(plot.TukeyHSD)
 dev.off()
 
 I couldn't view the resulting file with evince in Linux nor in the
 standard
 Preview in MacOS. When I compared the generated tukeyplot.ps to the
 same
 file generated with an older R in my Mac, I found one difference:
 
 $ diff -U2 oldtukeyplot.ps /Volumes/TIKKU/tukeyplot.ps
 --- oldtukeyplot.ps2010-12-14 12:06:07.0 +0200
 +++ /Volumes/TIKKU/tukeyplot.ps2010-12-14 12:13:32.0 +0200
 @@ -172,5 +172,5 @@
0 setgray
0.00 setlinewidth
 -[ 3.00 5.00] 0 setdash
 +[ 0.00 0.00] 0 setdash
np
660.06 91.44 m
 
 Editing the changed line to its old value [ 3.00 5.00] 0 setdash also
 fixed the problem both in Linux and in Mac. Evidently something has
 changed,
 and probably somewhere else than in plot.TukeyHSD (which hasn't changed
 since r51093 in trunk and never in R-2-12-branch). I know nothing about
 PostScript so that I cannot say anything more (and I know viewers can
 fail
 with standard conforming PostScript but it is a bit disconcerting
 that two
 viewers fail when they worked earlier).
 
 I must really be avoiding work today ...
 
 I can diagnose this (I think) but don't know the best way to
 solve it.
 
 At this point, line widths on PDF devices were allowed to be1.
 
 ==
 r52180 | murrell | 2010-06-02 23:20:33 -0400 (Wed, 02 Jun 2010) | 1 line
 Changed paths:
  M /trunk/NEWS
  M /trunk/src/library/grDevices/src/devPS.c
 
 allow lwd less than 1 on PDF device
 ==
 
 The behavior of PDF devices (by experiment) is to draw a 0-width
 line as 1 pixel wide, at whatever resolution is currently being
 rendered.  On the other hand, 0-width lines appear to break PostScript.
 (with the Linux viewer 'evince' I get warnings about rangecheck -15
 when trying to view such a file).
 
 plot.TukeyHSD  contains the lines
 
 abline(h = yvals, lty = 1, lwd = 0, col = lightgray)
 abline(v = 0, lty = 2, lwd = 0, ...)
 
 which are presumably meant to render minimum-width lines.
 
 I don't know whether it makes more sense to (1) change plot.TukeyHSD
 to use positive widths (although that may not help: I tried setting
 lwd=1e-5 and got the line widths rounded to 0 in the PostScript file);
 (2) change the postscript driver to *not* allow line widths   1 (i.e.,
 distinguish between PS and PDF and revert to the pre-r52180 behaviour
 for PS only).
 
 On reflection #2 seems to make more sense, but digging through devPS.c
 it's not immediately obvious to me where/how in SetLineStyle or
 PostScriptSetLineTexture one can tell whether the current driver
 is PS or PDF ...
 
 That may not do it. I find the same

[Rd] postscript failure manifests in plot.TukeyHSD

2010-12-14 Thread Jari Oksanen
Hello R Developers,

Dear R-developers, 

I ran some standard tests with currently (today morning) compiled R release
candidate in Linux R 2.12.1 RC (2010-12-13 r53843). Some of these tests used
plot.TukeyHSD function. This worked OK on the screen (X11 device), but
PostScript file could not be rendered. The following example had the problem
with me:

postscript(file=tukeyplot.ps)
example(plot.TukeyHSD)
dev.off()

I couldn't view the resulting file with evince in Linux nor in the standard
Preview in MacOS. When I compared the generated tukeyplot.ps to the same
file generated with an older R in my Mac, I found one difference:

$ diff -U2 oldtukeyplot.ps /Volumes/TIKKU/tukeyplot.ps
--- oldtukeyplot.ps2010-12-14 12:06:07.0 +0200
+++ /Volumes/TIKKU/tukeyplot.ps2010-12-14 12:13:32.0 +0200
@@ -172,5 +172,5 @@
 0 setgray
 0.00 setlinewidth
-[ 3.00 5.00] 0 setdash
+[ 0.00 0.00] 0 setdash
 np
 660.06 91.44 m

Editing the changed line to its old value [ 3.00 5.00] 0 setdash also
fixed the problem both in Linux and in Mac. Evidently something has changed,
and probably somewhere else than in plot.TukeyHSD (which hasn't changed
since r51093 in trunk and never in R-2-12-branch). I know nothing about
PostScript so that I cannot say anything more (and I know viewers can fail
with standard conforming PostScript but it is a bit disconcerting that two
viewers fail when they worked earlier).

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] One possible cause for incorrect symbols in X11() output

2010-08-19 Thread Jari Oksanen
On 19/08/10 09:55 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:

 There have been spasmodic reports of symbols such as pi and infinity
 in plotmath being reproduced incorrectly on the X11 device on some
 Linux systems (at least Ubuntu 10 and Fedora 12/13), and we've managed
 to track down one cause whilst investigating PR#14355.
 
 Some systems have Wine and hence the Wine symbol font installed.
 'fontconfig', which is used by cairographics in X11(type='cairo') and
 many other applications, prefers the Wine symbol font to the standard
 Type 1 URW font, and seems to misinterpret its encoding.
 
 You may well have Wine installed without realizing it (as I did) -- it
 is increasingly common as a dependency of other software. The best
 test is to run
 
 % fc-match symbol
 s05l.pfb: Standard Symbols L Regular
 
 This is the result on a system without Wine: if you see
 
 % fc-match symbol
 symbol.ttf: Symbol Regular
 
This seems to be the case with MacOS (10.6.4):

$ uname -a
Darwin lettu-2.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23
18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386
$ fc-match symbol
Symbol.ttf: Symbol 標準體

The X11(type = 'cairo') shows the problem with example(points);
TestChars(font=5). However, there is no problem with the default device
(quartz), nor with the default X11() which has type = 'Xlib' (unlike
documented in ?X11: 'cairo' is available but 'Xlib' still used).

What ever this is worth of (if this is worthless, I'll surely hear about
it).

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] One possible cause for incorrect symbols in X11() output

2010-08-19 Thread Jari Oksanen
On 19/08/10 14:04 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 OSX. I can't get fc-match to list it, anyway.
 
 R's X11(type='cairo') device is using a version of cairographics
 compiled by Simon which includes a static build of fontconfig.  So it
 is not really 'OSX'!  I'm guessing you are using
 /usr/local/bin/fc-match which AFAIK also Simon's.

$ which fc-match
/usr/X11/bin/fc-match

There seems to be no fc-match in /usr/local/bin/ in my Mac, so no Simon's
utilities. (But this is, of course, pretty irrelevant for the main subject,
and it seems that my installation of Ubuntu 10.04 is not affected by the
problem but has quite regular fonts -- no Wine today. Better that I shut
up).

Cheers, Jari Oksanen
 It is also not using pango, and so not selecting fonts the same way as
 on Linux.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] warning from install.packages()

2010-05-25 Thread Jari Oksanen
On 25/05/10 23:25 PM, Ben Bolker bol...@ufl.edu wrote:

 
  Just curious: is there a particular reason why install.packages()
gives a
 warning in normal use when 'lib' is not specified (e.g. argument
'lib' is
 missing: using '/usr/local/lib/R/site-library' )?  It would
seem to me that
 this is normal behavior as documented ( If missing,
defaults to the first
 element of Œ.libPaths()¹.)

Indeed, should this be a message()?

cheers, jaz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-Forge Problems

2010-05-05 Thread Jari Oksanen
On 5/05/10 20:53 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 If I go to:
 
 http://r-forge.r-project.org/scm/?group_id=18
 
 and click on [Browse Subversion Repository] in the box to the right it
 takes me to a page that says this:
 
 Traceback (most recent call last): File
 /usr/lib/gforge/bin//viewcvs.cgi, line 27, in import sapi
 ImportError: No module named sapi
 
 whereas I was expecting to get to the svn repository.

Gabor,

This was already queried in R-Forge site-help (May 2) and reported as bug in
R-Forge (May 3, bug #925). There has been no response to either of these
report. A News message of April 29 in R-Forge front page predicts that
browser functionality will follow soon. So there is hope...

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Canberra distance

2010-02-06 Thread Jari Oksanen



On 06/02/2010 18:10, Duncan Murdoch murd...@stats.uwo.ca wrote:

 On 06/02/2010 10:39 AM, Christophe Genolini wrote:
 Hi the list,
 
 According to what I know, the Canberra distance between X et Y is : sum[
 (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function
 'absolute value')
 In the source code of the canberra distance in the file distance.c, we
 find :
 
 sum = fabs(x[i1] + x[i2]);
 diff = fabs(x[i1] - x[i2]);
 dev = diff/sum;
 
 which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
 (note that this does not define a distance... This is correct when x_i
 and y_i are positive, but not when a value is negative.)
 
 Is it on purpose or is it a bug?
 
 It matches the documentation in ?dist, so it's not just a coding error.
   It will give the same value as your definition if the two items have
 the same sign (not only both positive), but different values if the
 signs differ.
 
 The first three links I found searching Google Scholar for Canberra
 distance all define it only for non-negative data.  One of them gave
 exactly the R formula (even though the absolute value in the denominator
 is redundant), the others just put x_i + y_i in the denominator.

G'day cobbers, 

Without checking the original sources (that I can't do before Monday), I'd
say that the Canberra distance was originally suggested only for
non-negative data (abundances of organisms which are non-negative if
observed directly). The fabs(x-y) notation was used just as a convenient
tool to get rid off the original pmin(x,y) for non-negative data -- which is
nice in R, but not so natural in C. Extension of the Canberra distance to
negative data probably makes a new distance perhaps deserving a new name
(Eureka distance?).

If you ever go to Canberra and drive around you'll see that it's all going
through a roundabout after a roundabout, and going straight somewhere means
goin' 'round 'n' 'round. That may make you skeptical about the Canberra
distance. 

Cheers, Jazza Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] optional package dependency

2010-01-14 Thread Jari Oksanen
On Fri, 2010-01-15 at 00:12 -0600, Jeff Ryan wrote:
 Hi Ross,
 
 The quantmod package makes available routines from a variety of
 contributed packages, but gets around your issues with a bit of, um,
 trickery.
 
 Take a look here (unless your name is Kurt ;-) ):
 
 http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/R/buildModel.methods.R?rev=367root=quantmodview=markup
 
 It would be nice to have Suggests really mean suggests to check, but I
 am sure there is a good reason it doesn't.
 
I agree: it would be nice to have Suggests really mean suggests, and
I 'suggested' so in an R-devel message of 20/9/05 with subject Shy
Suggestion (but this seems not exist in the R-devel archive?). I got
some support, but not from the right people, and so the R suggestion
remains the one you can't refuse or you'll wake up with a horse head in
your bed. I can live with this forced suggestion, although it is
sometimes painful, in particular in Mac or after re-installing
everything from scratch in Linux. 

The main argument was that building may fail later if you don't check
suggests early so that you must (de facto) depend on packages you
suggest. I'm sure many packages would fail now if the interpretation of
suggests was changed because the behaviour of suggests and depends
has been near identical for a long time and people have adapted. The
window of opportunity for another interpretation was when the checks for
undefined request() was added to the R CMD check routines in 2005, but
then it was decided that suggests should be near equivalent to
depends, and this will stick.

Cheers, Jari Oksanen

-- 
Jari Oksanen, Department of Biology, Univ Oulu, FI-90014 Oulu, Finland
http://www.oulu.fi/~jarioksa http://vegan.r-forge.r-project.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] file.rename overwrites existing target (PR#14065)

2009-11-15 Thread Jari Oksanen
On 15/11/09 16:35 PM, jo...@web.de jo...@web.de wrote:

 Full_Name: Jens Oehlschlägel
 Version: 2.10.0
 OS: Windows XP Professional
 Submission from: (NULL) (85.181.158.112)
 
 
 file.rename() will successfully rename file a to b - even if b exists already.
 Though the documentation does not state what file.rename() will do in this
 case,
 I guess the expected behaviour is to fail and return FALSE.

The *expected* behaviour is to overwrite the old file. Your expectation
seems to be different, but overwriting or deleting the old file has been the
behaviour for ever (= since 1970s). This is how MacOS defines the behaviour
of the system command 'rename':

RENAME(2)   BSD System Calls Manual

NAME
 rename -- change the name of a file
...
DESCRIPTION
 The rename() system call causes the link named old to be renamed as
new.
 If new exists, it is first removed.

The behaviour is the same in all posixy systems. Sebsinble systems like R
follow the documented standard behaviour.

Why would you expect that 'file.rename' fails if the 'new' file exists?

The unix command 'mv' (move) that does the 'rename' has a switch to overturn
the standard 'rename' system call, and prompt for the removal of the 'new'
file. However, this switch is usually not the default in unixy systems,
unless defined so in the shell start up script of the user.

Cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] eurodist example dataset is malformed

2009-08-16 Thread Jari Oksanen
Justin,

I suggest you try to remove your malformed eurodist and use the one in R.
The svn logs show no changes in eurodist since 2005 when 'r' was added to
'Gibralta' (it still has all the wrong distances which perhaps go back to
the poor quality of Cambridge Encyclopaedia). I also installed R 2.9.1 for
MacOS to see that there neither is a change in 'eurodist' in the Mac
distribution. My virgin eurodist in Mac was clean, with all its errors. All
this hints that you have a local copy of malformed eurodist in your
computer. Perhaps

rm(eurodist)
eurodist

will help.

Cheers, Jari Oksanen


On 15/08/09 06:13 AM, Justin Donaldson jjdon...@indiana.edu wrote:

 Here's my osx data/session info (identical after a re-install):
 
 class(eurodist)
 [1] data.frame
 sessionInfo()
 R version 2.9.1 (2009-06-26)
 i386-apple-darwin8.11.1
 
 locale:
 en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 
 -Justin
 
 
 
 On Thu, Aug 13, 2009 at 4:48 AM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote:
 
 On Wed, 2009-08-12 at 20:26 -0400, Justin Donaldson wrote:
 The eurodist dataset (my favorite for mds) is malformed.  Instead of a
 standard distance matrix, it's a data frame.  The rownames have gotten
 'bumped' to a new anonymous dimension X.   It's possible to fix the
 data,
 but it messes up a lot of example code out there.
 
   X Athens Barcelona Brussels Calais ...
 1Athens  0  3313 2963   3175
 2 Barcelona   3313 0 1318   1326
 3  Brussels   2963  13180204
 4Calais   3175  1326  204  0
 5 Cherbourg   3339  1294  583460
 6   Cologne   2762  1498  206409
 ...
 
 Best,
 -Justin
 
 What version of R, platform, loaded packages etc? This is not what I see
 on Linux, 2.9.1-patched r49104.
 
 class(eurodist)
 [1] dist
 sessionInfo()
 R version 2.9.1 Patched (2009-08-07 r49104)
 x86_64-unknown-linux-gnu
 
 locale:
 
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;
 LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
 LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base
 
 loaded via a namespace (and not attached):
 [1] tools_2.9.1
 
 Have you tried this in a clean session to see if it persists there?
 
 If you can reproduce this in a clean session with an up-to-date R or
 R-Devel then send details of your R back to the list for further
 investigation.
 
 HTH
 
 G
 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w]
 http://www.ucl.ac.uk/~ucfagls/http://www.ucl.ac.uk/%7Eucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 
 


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] cmdscale non-Euclidean dissimilarities

2009-06-10 Thread Jari Oksanen
Dear R gurus,

I think that cmdscale() function has problems with non-Euclidean
distances which have negative eigenvalues. The problems are two-fold: 

(1) Wrong eigenvalue is removed: there will be at least one zero
eigenvalue in cmdscale, and the function assumes it is the last one.
With non-Euclidean dissimilarities you will have negative eigenvalues,
and the zero eigenvalue will be the last positive one before negative
eigenvalues. Now the function returns the zero eigenvalue and
corresponding zero-eigenvector, but drops the last negative eigenvalue
(which has larger absolute value than any other negative eigenvalue). 

(2) Gower (1985) says that with non-Euclidean matrices and negative
eigenvalues you will have imaginary axes, and the distances on imaginary
axes (negative eigenvalues) should be subtracted from the distances on
real axes (positive eigenvalues). The formulation in the article is like
this (Gower 1985, p. 93):


f_{ii} + f_{jj} - 2 f_{ij} = d_{ij}^2 =
\sum_{p=1}^r (l_{pi} - l_{pj})^2 - \sum_{p=r+1}^{r+s} (l_{pi} - l_{pj})^ 2

This is the usual Pythagorean representation of squared distances in
terms of coordinates $l_{pi} (p = 1, 2 \ldots r+s)$, except that for
$pr$ the coordinates become purely imaginary.


This also suggests that for GOF (goodness of fit) measure of cmdscale()
the negative eigenvalues should be subtracted from the sum of positive
eigenvalues. Currently, the function uses two ways: the sum of abs
values of eigenvalues (and it should be sum of eigenvalues with their
signs), and the sum of above-zero eigenvalues for the total. The latter
makes some sense, but the first looks non-Gowerian. 

Reference

Gower, J. C. (1985) Properties of Euclidean and non-Euclidean distance
matrices. Linear Algebra and its Applications 67, 81--97. 

The following change seems to avoid both problems. The change removes
only the eigenvalue that is closest to the zero. There may be more than
one zero eigenvalue (or of magnitude 1e-17), but this leaves the rest
there. It also changes the way the first alternative of GOF is
calculated. This changes the code as little as possible, and it still
leaves behind some cruft of the old code that assumed that last
eigenvalue is the zero eigenvalue. 

--- R/src/library/stats/R/cmdscale.R(revision 48741)
+++ R/src/library/stats/R/cmdscale.R(working copy)
@@ -56,6 +56,9 @@
 x[non.diag] - (d[non.diag] + add.c)^2
 }
 e - eigen(-x/2, symmetric = TRUE)
+zeroeig - which.min(abs(e$values))
+e$values - e$values[-zeroeig]
+e$vectors - e$vectors[ , -zeroeig, drop = FALSE]
 ev - e$values[1L:k]
 if(any(ev  0))
 warning(gettextf(some of the first %d eigenvalues are  0, k),
@@ -63,9 +66,9 @@
 points - e$vectors[, 1L:k, drop = FALSE] %*% diag(sqrt(ev), k)
 dimnames(points) - list(rn, NULL)
 if (eig || x.ret || add) {
-evalus - e$values[-n]
+evalus - e$values
 list(points = points, eig = if(eig) ev, x = if(x.ret) x,
  ac = if(add) add.c else 0,
- GOF = sum(ev)/c(sum(abs(evalus)), sum(evalus[evalus  0])))
+ GOF = sum(ev)/c(sum(evalus), sum(evalus[evalus  0])))
 } else points
 }

Best wishes, Jari Oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, FI-90014 Oulu, Finland
email jari.oksa...@oulu.fi, homepage http://cc.oulu.fi/~jarioksa/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Windows binary packages R-Forge

2008-05-07 Thread Jari Oksanen

On Wed, 2008-05-07 at 09:48 +0200, Yohan Chalabi wrote:
 Hi room,
 
 There seems to be a problem with the Windows building machines of
 R-Forge. All our packages with Fortran source code cannot be compiled
 for Windows. The error in the log file is
 
 make[3]: gfortran: Command not found
 
 It seems that gfortran is not installed. Is there any plan to fix this
 or am I doing something wrong on R-Forge?
 
 thanks in advance for your advises.
Dear Yohan Chalabi,

This has been reported on R-Forge support forum on 29 April, 2008:

https://r-forge.r-project.org/tracker/index.php?func=detailaid=139group_id=34atid=194

Thomas Petzold even posted there the probable cure. I hope the issue
will be solved some day soon.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] xspline(..., draw=FALSE) fails if there is no open device (PR#10727)

2008-02-08 Thread jari . oksanen
Full_Name: Jari Oksanen
Version:  2.6.2 RC (2008-02-07 r44369)
OS: Linux
Submission from: (NULL) (130.231.102.145)


Even if function xspline() is called with argument draw=FALSE, it requires a
graphics device (that it won't use since it was draw=FALSE). I run into this
because I intended to use xspline within a function (that does not yet draw:
there is plot method for that), and the function failed when called in a virgin
environment. 

Here is an example in a virgin environemt just after starting R:

 out - xspline(c(0,1,0), c(1,0,1), draw=FALSE)
Error in xspline(c(0, 1, 0), c(1, 0, 1), draw = FALSE) : 
  plot.new has not been called yet
 str(out)
Error in str(out) : object out not found

This works:

 plot(0)
 out - xspline(c(0,1,0), c(1,0,1), draw=FALSE)
 str(out)
List of 2
 $ x: num [1:3] 0 1 0
 $ y: num [1:3] 1 0 1

This won't:

 dev.off()
null device 
  1 
 xspline(c(0,1,0), c(1,0,1), draw=FALSE)
Error in xspline(c(0, 1, 0), c(1, 0, 1), draw = FALSE) : 
  plot.new has not been called yet

R graphics internal are black magic to me. However, it seems that the error
messge comes from function GCheckState(DevDesc *dd) in graphics.c, which is
called by do_xspline(SEXP call, SEXP op, SEXP args, SEXP env) in plot.c even
when xspline was called with draw = FALSE (and even before getting the argument
draw into do_xspline). It seems that graphics device is needed somewhere even
with draw = FALSE, since moving the  GCheckState() test after findig the value
draw, and executing the test only if draw=TRUE gave NaN as the numeric output. 

If this is documented behaviour, the documentation escaped my attention and beg
for pardon. It may be useful to add a comment on the help page saying that an
open graphics device is needed even when unused with draw=FALSE.

Cheers, Jari Oksanen

 platform = i686-pc-linux-gnu
 arch = i686
 os = linux-gnu
 system = i686, linux-gnu
 status = RC
 major = 2
 minor = 6.2
 year = 2008
 month = 02
 day = 07
 svn rev = 44369
 language = R
 version.string = R version 2.6.2 RC (2008-02-07 r44369)

Locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils,
package:datasets, package:methods, Autoloads, package:base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Saving Graphics File as .ps or .pdf (PR#10403)

2007-11-07 Thread Jari Oksanen

On Wed, 2007-11-07 at 10:51 +0100, Simone Giannerini wrote:
 [snip] (this is from pd = Peter Dalgaard)
  Maybe, but given the way things have been working lately, it might be
  better to emphasize
 
  (a) check the mailinglists
  (b) try R-patched
  (c) if in doubt, ask, rather than report as bug
 
  (Ideally, people would try the prerelease versions and problems like
  this would be caught before the actual release, but it seems that they
  prefer treating x.y.0 as a beta release...)
 
 
 I am sorry but I do not agree with point (b) for the very simple fact
 that the average Windows user do not know how to compile the source
 code and might not even want to learn how to do it. The point is that
 since (if I am correct) the great majority of  R users go Windows you
 would miss an important part of potential bug reports by requiring
 point (b) whereas (a) and (c) would suffice IMHO.
 Maybe if there were Win binaries of the prerelease version available
 some time before the release you would get much more feedback but I am
 just guessing.

First I must say that patched Windows binaries are available from CRAN
with one extra click -- Linux and poor MacOS users must use 'svn co' to
check out the patched version from the repository and compile from the
sources. The attribute poor for MacOS users was there because this is
a bigger step for Mac users than Linux users (who can easily get and
install all tools they need and tend to have a different kind of
mentality). 

Then I must say that I do not like this policy either. I think that is
fair to file a bug report against the latest release version in good
faith without being chastised and condemned. I know (like pd says above)
that some people really do treat x.y.0 as beta releases: a friend of
mine over here even refuses to install R x.x.0 versions just for this
reason (in fact, he's pd's mate, too, but perhaps pd can talk him over
to try x.x.0 versions). Filing a bug report against latest x.x.1
shouldn't be too bad either.

I guess the problem here is that R bug reports are linked to the Rd
mailing list, and reports on alredy fixed bugs really are irritating.
In more loosely connected bug reporting systems you simply could mark a
bug as a duplicate of # and mark it as resolved without generating
awfully lot of mail. Then it would be humanly possible to adopt a more
neutral way of answering to people who reported bugs in latest releases.
Probably that won't happen in the current environment.

Cheers, Jari Oksanen

PS. Please Mr Moderator, don't treat me so mean (*): I've subscribed to
this group although you regularly reject my mail as coming from a
non-member. 

(*) an extract from a classic song Mr R jumped the rabbit.
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R can't source() long lines (PR#10383)

2007-10-30 Thread Jari Oksanen

On Tue, 2007-10-30 at 08:10 +0100, [EMAIL PROTECTED] wrote:
 This is as documented in ?source, and so is not a bug.
 
This gives us a FAQ answer:

Q: What is the difference between a feature and a bug?

A: Features are documented, bugs are undocumented. If it is a bug, it is
either a bug in a function or a bug in the documentation (usually the
latter).

cheers, jari oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] boxplot() confuses x- and y-axes (PR#10345)

2007-10-15 Thread Jari Oksanen
On Mon, 2007-10-15 at 15:25 +0200, [EMAIL PROTECTED] wrote:
  ms == marc schwartz [EMAIL PROTECTED]
  on Mon, 15 Oct 2007 14:20:16 +0200 (CEST) writes:
 
 ms On Mon, 2007-10-15 at 10:30 +0200, [EMAIL PROTECTED] wrote:
  Full_Name: Bob O'Hara
  Version: 2.6.0
  OS: Windows XP
  Submission from: (NULL) (88.112.20.250)
  
  
  Using horizontal=TRUE with boxplot() confuses it as to what is an x- 
 or y-axis. 
  At least, xlim= and ylim= are the wrong way round, log=x (or y) 
 and xaxt=
  work as expected, I haven't looked at anything else.
  
  Some code to see if you can reproduce the bug (or discover it's in my 
 head...):
  
  boxplot(count ~ spray, data = InsectSprays)
  
  # Try to change x-axis:
  boxplot(count ~ spray, data = InsectSprays, xlim=c(0,50))
  
  # Plot horizontally:
  boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE)
  
  # Now try to change x-axis:
  boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE, 
 xlim=c(0,50))
  # Changes y-axis!
  
  # Now try to change y-axis:
  boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE, 
 ylim=c(0,50))
  # Changes x-axis!
  
  # Plot x-axis on log scale:
  boxplot(count+1 ~ spray, data = InsectSprays, horizontal=TRUE, log=x)
  # Does indeed change x-axis
  
  # Don't add ticks on x-axis:
  boxplot(count ~ spray, data = InsectSprays, horizontal=TRUE, xaxt=n)
  # Works as expected.
 
 ms Hi Bob,
 
 ms No, it's not in your head. This is documented in ?bxp, which is the
 ms function that actually does the plotting for boxplot(). See the
 ms description of 'pars' in ?bxp:
 
 ms Currently, yaxs and ylim are used ‘along the boxplot’, i.e.,
 ms vertically, when horizontal is false, and xlim horizontally.
 
 ms So essentially, the named 'x' and 'y' axes are rotated 90 degrees when
 ms you use 'horizontal = TRUE', rather than the vertical axis always 
 being
 ms 'y' and the horizontal axis always being 'x'. This has been discussed 
 on
 ms the lists previously.
 
 Yes; thank you, Marc.
 
 And the reason for this is very sensible I think:
 
 If you have a longish  boxplot()  or  bxp() command,
 and you just want to go from vertical to horizontal or vice
 versa, it makes most sense just to have to change the
 'horizontal' flag and not having to see if there are other 'x*'
 and or 'y*' arguments that all need to be changed as well.
 
Except that you must change xaxt/yaxt and log=x/log=y which do not
follow the along the box logic, and behave differently than
xlim/ylim. 

Nothing of this is fatal, but this probably needs more than one
iteration to find which way each of the x* and y* arguments works.

cheers, jari oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste() with NAs .. change worth persuing?

2007-08-23 Thread Jari Oksanen

On 22 Aug 2007, at 20:16, Duncan Murdoch wrote:

 On 8/22/2007 11:50 AM, Martin Maechler wrote:
 Consider this example code

  c1 - letters[1:7]; c2 - LETTERS[1:7]
  c1[2] - c2[3:4] - NA
  rbind(c1,c2)

   ##   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
   ## c1 a  NA   c  d  e  f  g
   ## c2 A  B  NA   NA   E  F  G

   paste(c1,c2)

   ## - [1] a A  NA B c NA d NA e E  f F  g G

 where a more logical result would have entries 2:4 equal to
   NA
 i.e.,  as.character(NA)
 akaNA_character_

 Is this worth persuing, or does anyone see why not?

 A fairly common use of paste is to put together reports for human
 consumption.  Currently we have

 p - as.character(NA)
 paste(the value of p is, p)
 [1] the value of p is NA

 which looks reasonable. Would this become

 p - as.character(NA)
 paste(the value of p is, p)
 [1] NA

 under your proposal?  (In a quick search I was unable to find a real
 example where this would happen, but it would worry me...)

At least stop() seems to include such a case:

  message - paste(args, collapse = )

and we may expect there are NAs sometimes in stop().

cheers, jazza
--
Jari Oksanen, Oulu, Finland

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] package check note: no visible global function definition (in functions using Tcl/Tk)

2007-06-12 Thread Jari Oksanen
On Tue, 2007-06-12 at 00:42 +0200, Henrik Bengtsson wrote:
 On 6/11/07, Seth Falcon [EMAIL PROTECTED] wrote:
  Prof Brian Ripley [EMAIL PROTECTED] writes:
 
   It seems that is happens if package tcltk is missing from the Depends:
   list in the DESCRIPTION file.  I just tested with Amelia and homals and
   that solved the various warnings in both cases.
 
  Adding tcltk to Depends may not always be the desried solution.  If
  tcltk is already in Suggests, for example, and the intention is to
  optionally provide GUI features, then the code may be correct as-is.
  That is, codetools will issue the NOTEs if you have a function that
  looks like:
 
 f - function() {
   if (require(tckltk)) {
   someTckltkFunctionHere()
   } else
   otherwiseFunction()
   }
 }
 
  There are a number of packages in the BioC repository that provide
  such optional features (not just for tcltk) and it would be nice to
  have a way of declaring the use such that the NOTE is silenced.
 
 Same scenario here: I am using Suggest and I found that the NOTEs go
 away if you call the function with double-colon (::), e.g.
 tcltk::someTckltkFunctionHere().
 
 I also got several NOTEs about non-declared objects if I used
 request(pkgname), but they go away with request(pkgname).
 
The real problem here is what are the consequences for CRAN auditing
with the new defaults. Do you have to pass these tests also? Do you
implement stricter package dependence checking? Do you still allow the
check circumvention device that Henrik, perhaps unwisely, revealed here
(that is package::function)? 

Just being curious, I run checkUsagePackage() for my CRAN package
(vegan), and got 109 messages. 58 of these were local variables
assigned but may not be used and need be checked. My first impression
was that they were just harmless leftover, and removing those is not
among my top priorities, but may wait till September. Some were false
positives. Most of the rest (49 + 1 special case) were calls to
functions in other packages with require || stop in the function body.
I'd like to keep them like this, or at least with the circumvention
device. Please don't make this test a requirement in CRAN submissions!

One real error was detected also, but fixing that error broke the
function, since the rest of the function already was expecting erroneous
output to work correctly. 

I urge for more relaxed dependence checking allowing calls to other
packages in functions. I've been a Linux user since Red Hat 5.1 and I
know what is a dependence hell (package depending on package
depending ... depending on broken package). There already are some signs
of that in R, in particular in unsupported platforms like MacOS 10.3.9
where I have trouble in installing some packages that depend on
packages... (if somebody wonders why I still use MacOS 10.3.9, I can
give 129 reasons, each worth one Euro). 

cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'R CMD INSTALL mypkg' doesn't always update help pages

2007-06-05 Thread Jari Oksanen

On 6 Jun 2007, at 01:45, Herve Pages wrote:

 Hi,

 'R CMD INSTALL mypkg' and 'install.packages(mypkg, repos=NULL)' don't
 update mypkg help pages when mypkg is a source directory. They only
 install new help pages if there are some but they leave the already
 installed pages untouched. So you end up with mixed man pages from
 different versions of the package :-/

I have observed this, too.

cheers, Jari Oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] step() in sink() and Sweave()

2007-05-09 Thread Jari Oksanen
Dear developers,

I just noticed that step() function currently prints the current model
using message(), but the resulting model using print(). The relevant
commands within the step() body are:

if (trace) message(Start:  AIC=, format(round(bAIC, 2)), \n, 
cut.string(deparse(as.vector(formula(fit, \n)

(with example() output:) 
Start:  AIC=190.69
Fertility ~ Agriculture + Examination + Education + Catholic + 
Infant.Mortality

And later:

if (trace) print(aod[o, ])

(with example() output:)

   Df Sum of SqRSSAIC
- Examination   1  53.0 2158.1  189.9
none  2105.0  190.7
- Agriculture   1 307.7 2412.8  195.1
- Infant.Mortality  1 408.8 2513.8  197.0
- Catholic  1 447.7 2552.8  197.8
- Education 11162.6 3267.6  209.4

This is a nuisance if you want to divert output to a file with sink() or
use step() in Sweave: the header and the table go to different places,
and without message() part the print() part is crippled.  It may be that
there is some way to avoid this, but obviously that needs some degree of
acrobatic R skills. 

An example of the behaviour:

sink(tempfile())
example(step)
sink()

I assueme that the behaviour is intentional but searching NEWS did not
give any information or reasoning. Would it be sensible to go back to
the old behaviour? I found some Swoven files from R 2.4.0 that still put
both parts of the output to the same place. For the sake of Sweave and
sink, I'd prefer the one place to be stdout instead of stderr.

Best wishes, Jari Oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] prcomp: problem with zeros? (PR#8870)

2006-05-17 Thread Jari Oksanen

On 17 May 2006, at 22:02, [EMAIL PROTECTED] wrote:

 On Wed, 17 May 2006, [EMAIL PROTECTED] wrote:

 prcomp has a bug which causes following error

Error in svd(x, nu = 0) : infinite or missing values in 'x'

 on a valid data set (no Infs, no missing values). The error is most 
 likely
 caused by the zeros in data.

 Why do you say that?  Without a reproducible example, we cannot judge 
 what
 is going on.  If you called prcomp with scale=TRUE on a matrix that 
 has a
 completely zero (or constant) column, then this is a reasonable error
 message.

Constant columns (which is a likely reason here) indeed become NaN 
after scale(), but the error message was:

Error in svd(x, nu = 0) : infinite or missing values in 'x'

and calling this 'reasonable' is stretching the limits of reason.

However, in general this is easy to solve:  scale() before the 
analysis and replace NaN with 0 (prcomp handles zeros).  For instance,

x - scale(x)
x[is.nan(x)] - 0
prcomp(x)

(and a friendly prcomp() would do this internally.)

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] typo in `eurodist'

2005-12-09 Thread Jari Oksanen
Dear all,

There really seem to be many exciting issues in spelling and in
detecting spelling errors. However, a more disturbing feature in
'eurodist' to me is that the distances seem to be wrong. There are
several cases where the triangle inequality is violated so that a trip
from A to B is shorter when you make a detour via X instead of going
directly (see require(fortunes); fortune(eurodist) for an example). A
quick look revealed that you can find such a shorter detour for 104 of
210 distances of 'eurodist'. There is no guarantee that these shortest
path distances would be correct, but at least they are metric.

Just for fun, here are the differences between actual eurodist's and
shortest paths among the towns in the eurodist data:

Athens Barcelona Brussels Calais Cherbourg
Barcelona 1036
Brussels   635 0
Calais 705130
Cherbourg  819 00  0
Cologne448   1390  0 0
Copenhagen 507   459  525537   545
Geneva 879 00  0 0
Gibralta  1037 00  0 2
Hamburg438   2140  0 0
Hook of Holland530 00  0 0
Lisbon1623 1  216135 0
Lyons 1022 00  0 0
Madrid1036 00  0 0
Marseilles1037 01  0 0
Milan  879410 1092
Munich 445610 26 0
Paris  798 00  0 0
Rome 0 00  991
Stockholm  508   459  525537   546
Vienna   070   32 35 0
Cologne Copenhagen Geneva Gibralta Hamburg
Barcelona
Brussels
Calais
Cherbourg
Cologne
Copenhagen  222
Geneva  790300
Gibralta  0499  0
Hamburg   0  0  0   49
Hook of Holland   0  0 460   0
Lisbon  3986626000 334
Lyons 0327  00   0
Madrid   26499  00  48
Marseilles1327  00   0
Milan 0171  0   40 102
Munich0  0  0   89   0
Paris 0450  00   0
Rome  0 98 810  29
Stockholm   215  0300  539   0
Vienna0  0  0   70   0
Hook of Holland Lisbon Lyons Madrid Marseilles
Barcelona
Brussels
Calais
Cherbourg
Cologne
Copenhagen
Geneva
Gibralta
Hamburg
Hook of Holland
Lisbon  240
Lyons 1  0
Madrid0  0 0
Marseilles1264 0  0
Milan 1744 0115  0
Munich067065 70160
Paris 0150 0  0  1
Rome  0608   134  1  0
Stockholm   581272   327539327
Vienna067270 41  0
Milan Munich Paris Rome Stockholm
Barcelona
Brussels
Calais
Cherbourg
Cologne
Copenhagen
Geneva
Gibralta
Hamburg
Hook of Holland
Lisbon
Lyons
Madrid
Marseilles
Milan
Munich  0
Paris  57  0
Rome0 2991
Stockholm 171  0   451  105
Vienna139  0 00 1

It seems that marginal towns (Athens, Lisbon, Stockholm, Copenhagen)
have largest discrepancies.

It also seems that the names are not 'localized', but weird English
forms are used for places like København and Wien so dear to the R core
developers.

cheers, jari oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Shy Suggestion?

2005-09-20 Thread Jari Oksanen
The R-exts manual says about 'Suggests' field in package DESCRIPTION:

The optional `Suggests' field uses the same syntax as `Depends' and
lists packages that are not necessarily needed.

However, this seems to be a suggestion you cannot refuse. If you suggest
packages:

(a line from DESCRIPTION):
Suggests: MASS, ellipse, rgl, mgcv, akima, lattice

This is what happens:

$ /tmp/R-alpha/bin/R CMD check vegan
* checking for working latex ... OK
* using log directory '/home/jarioksa/devel/R/vegan.Rcheck'
* using R version 2.2.0, 2005-09-19
* checking for file 'vegan/DESCRIPTION' ... OK
* this is package 'vegan' version '1.7-75'
... clip ...
* checking package dependencies ... ERROR
Packages required but not available:
  ellipse rgl akima

In my cultural context suggesting a package means that it is not
necessarily needed and the check should not fail, although some
functionality would be unavailable without those packages.  I want the
package to pass the tests in a clean standard environment without
forcing anybody to load any extra packages. Is there a possibility to be
modest and shy in suggestions so that it would be up to the user to get
those extra packages needed without requiring them in R CMD check?

I stumbled on this with earlier versions of R, and then my solution was
to suggest nothing. 

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Shy Suggestion?

2005-09-20 Thread Jari Oksanen
On Tue, 2005-09-20 at 09:42 -0400, Roger D. Peng wrote:
 I think this needs to fail because packages listed in 'Suggests:' may, for 
 example, be needed in the examples.  How can 'R CMD check' run the examples 
 and 
 verify that they are executable if those packages are not available?  I 
 suppose 
 you could put the examples in a \dontrun{}.
 
Yes, that's what I do, and exactly for that reason: if something is not
necessarily needed (= 'suggestion' in this culture), it should not be
required in tests. However, if I don't use \dontrun{} for a
non-recommended package, the check would fail and I would get the needed
information: so why should the check fail already when checking
DESCRIPTION?

cheers, jari oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] generic function argument list problem

2005-08-31 Thread Jari Oksanen
On Wed, 2005-08-31 at 08:09 +0100, Robin Hankin wrote:
 Hi
 
 it says in R-exts that
 
 
  A method must have all the arguments of the generic,  
 including ... if the generic does.
  A method must have arguments in exactly the same order as the  
 generic.
  A method should use the same defaults as the generic.
 
 
 So, how come the arguments for rep() are (x, times, ...) and the  
 arguments
 for rep.default() are  (x, times, length.out, each, ...) ?
Shouldn't  
 these be the same?
 
 
 I am writing a rep() method for objects with class octonion, and
 my function rep.octonion() has argument list (x, times, length.out,  
 each, ...)
 just like rep.default(),   but  R CMD check complains about it,
pointing
 out that rep() and rep.octonion() have different arguments.
 
 What do I have to do to my rep.octonion() function to make my package
   pass R CMD check without warning?
 
I cannot repeat your problem. Probably you did something differently
than you said (like omitted ... , misspelled times as time or
something else in your rep.octonion).

This is what I tried.

In R:
 str(rep)
function (x, times, ...)
 rep.octonion - function(x, times, length.out, each, ...) {}
 package.skeleton(octonion, rep.octonion)
Creating directories ...
Creating DESCRIPTION ...
Creating READMEs ...
Saving functions and data ...
Making help files ...
Created file named './octonion/man/rep.octonion.Rd'.
Edit the file and move it to the appropriate directory.
Done.
Further steps are described in ./octonion/README

Then I edited octonion/man/rep.octonion.Rd so that it uses the generic
and passes R CMD check (virgin Rd files produced by package.skeleton
fail the test, which I found a bit weird). Here are the minimum changes
you need to pass the tests.

--- rep.octonion.Rd.orig2005-08-31 10:56:36.0 +0300
+++ rep.octonion.Rd 2005-08-31 10:55:25.0 +0300
@@ -7,5 +7,5 @@
 }
 \usage{
-rep.octonion(x, times, length.out, each, ...)
+\method{rep}{octonion}(x, times, length.out, each, ...)
 }
 %- maybe also 'usage' for other objects documented here.
@@ -18,5 +18,5 @@
 }
 \details{
-  ~~ If necessary, more details than the __description__  above ~~
+  ~~ If necessary, more details than the description  above ~~
 }
 \value{
@@ -31,7 +31,7 @@
 \note{ ~~further notes~~ }

- ~Make other sections like Warning with \section{Warning }{} ~

-\seealso{ ~~objects to See Also as \code{\link{~~fun~~}}, ~~~ }
+
+\seealso{ ~~objects to See Also as \code{\link{rep}}, ~~~ }
 \examples{
 ## Should be DIRECTLY executable !! 
@@ -42,4 +42,4 @@
 function(x, time, length.out, each, ...) {}
 }
-\keyword{ ~kwd1 }% at least one, from doc/KEYWORDS
-\keyword{ ~kwd2 }% __ONLY ONE__ keyword per line
+\keyword{ models }% at least one, from doc/KEYWORDS
+

So this replaces rep.octonion with \method{rep}{octonion}, removes __
from description (these cause latex errors), remove a hanging top level
text Make other sections..., and removes a link to non-existent
~~fun~~ (I'm not sure if adding a real keyword is necessary).

This passes tests. Including 

* checking S3 generic/method consistency ... OK

Conclusion: check your files. (It is pain: been there, done that.)

cheers, jari oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Why should package.skeleton() fail R CMD check?

2005-08-31 Thread Jari Oksanen
I find it a bit peculiar that a package skeleton created with a utils
function package.skeleton() fails subsequent R CMD check. I do
understand that the function is intended to produce only a skeleton that
should be edited by the package author. I think that it would be
justified to say that the skeleton *should* fail the test. However, I
have two arguments against intentional failure:

* When you produce a skeleton, a natural thing is to see if it works and
run R CMD check. It is is baffling (but educating) if this fails.

* The second argument is more major: If you produce a package with
several functions, you want to edit one Rd file in time to see what
errors you made. You don't want to correct errors in other Rd files not
yet edited by you to see your own errors. This kind of incremental
editing is much more pleasant, as following strict R code is painful
even with your own mistakes.

The failure comes only from Rd files, and it seems that the violating
code is produced by prompt.default function hidden in the utils
namespace. I attach a uniform diff file which shows the minimal set of
changes I had to do make utils:::prompt.default to produce Rd files
passing R CMD check. There are still two warnings: one on missing source
files and another on missing keywords, but these are not fatal. This
still produces bad looking latex. These are the changes I made 

* I replaced __description__ with description, since __ will give
latex errors. 

* I enclosed Make other sections within Note, so that it won't give
error on stray top level text. It will now appear as numbered latex
\section{} in dvi file, but that can the package author correct.

* I replaced reference to a non-existent function ~~fun~~ with a
reference to function help. 

I'm sorry for the formatting of the diff file: my emacs/ESS is cleverer
than I and changes indentation and line breaks against my will.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why should package.skeleton() fail R CMD check?

2005-08-31 Thread Jari Oksanen
On Wed, 2005-08-31 at 11:23 +0200, Martin Maechler wrote:

 Since you didn't use  text/plain  as content type, your
 attachment didn't make it to the list anyway, 

Yeah, I noticed. 

 and you have a
 second chance:
 
 Please use a diff -u against
 
https://svn.R-project.org/R/trunk/src/library/utils/R/prompt.R
 
 or maybe even a diff -ubBw ... one.
 

Here comes a uniform diff against svn source of prompt.R. I hope I made
all the same changes as previously. At least package.skeletons with this
pass R CMD check with the same two warnings as previously (after you
beat the namespace -- oh how I hate namespaces):

--- prompt.R2005-08-31 12:30:28.0 +0300
+++ prompt.R.new2005-08-31 12:32:13.0 +0300
@@ -96,5 +96,5 @@
  details = c(\\details{,
  paste(  ~~ If necessary, more details than the,
-   __description__  above ~~),
+   description above ~~),
  }),
  value = c(\\value{,
@@ -108,11 +108,11 @@
  literature/web site here ~ }),
  author = \\author{ ~~who you are~~ },
- note = c(\\note{ ~~further notes~~ },
+ note = c(\\note{ ~~further notes~~ ,
  ,
  paste( ~Make other sections like Warning with,
\\section{Warning }{} ~),
- ),
+ }),
  seealso = paste(\\seealso{ ~~objects to See Also as,
- \\code{\\link{~~fun~~}}, ~~~ }),
+ \\code{\\link{help}}, ~~~ }),
  examples = c(\\examples{,
  ## Should be DIRECTLY executable !! ,

Cheers,

Jari Oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem using model.frame()

2005-08-17 Thread Jari Oksanen

On 18 Aug 2005, at 1:49, Gavin Simpson wrote:

 On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
 GS == Gavin Simpson [EMAIL PROTECTED]
 on Tue, 16 Aug 2005 18:44:23 +0100 writes:

 GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
 GS wrote:
 On 8/16/05, Gavin Simpson [EMAIL PROTECTED]
 wrote:  On Tue, 2005-08-16 at 11:25 -0400, Gabor
 Grothendieck wrote:   It can handle data frames like
 this:

 model.frame(y1)   or   model.frame(~., y1)

 Thanks Gabor,

 Yes, I know that works, but I want the function
 coca.formula to accept a  formula like this y2 ~ y1,
 with both y1 and y2 being data frames. It is

 The expressions I gave work generally (i.e. lm, glm,
 ...), not just in model.matrix, so would it be ok if the
 user just does this?

 yourfunction(y2 ~., y1)

 GS Thanks again Gabor for your comments,

 GS I'd prefer the y1 ~ y2 as data frames - as this is the
 GS most natural way of doing things. I'd like to have (y2
 GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
 GS work - silently without any trouble.

 I'm sorry, Gavin, I tend to disagree quite a bit.

 The formula notation has quite a history in the S language, and
 AFAIK never was the idea to use data.frames as formula
 components, but rather as environments in which formula
 components are looked up --- exactly as Gabor has explained.

 Hi Martin, thanks for your comments,

 But then one could have a matrix of variables on the rhs of the formula
 and it would work - whether this is a documented feature or un-intended
 side-effect of matrices being stored as vectors with dims, I don't 
 know.

 And whilst the formula may have a long history, a number of packages
 have extended the interface to implement a specific feature, which 
 don't
 work with standard functions like lm, glm and friends. I don't see how
 what I wanted to achieve is greatly different to that or using a 
 matrix.

 To break with such a deeply rooted principle,
 you should have very very good reasons, because you're breaking
 the concepts on which all other uses of formulae are based.
 And this would potentially lead to much confusion of your users,
 at least in the way they should learn to think about what
 formulae mean.

 In the end I managed to treat y1 ~ y2 (both data frames) as a special
 case, which allows the existing formula notation to work as well, so I
 can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
 is what I wanted all along, to extend my interface (not do anything to
 R's formulae), but to also work in the traditional sense.

 The model I am writing code for really is modelling the relationship
 between two matrices of data. In one version of the method, there is
 real equivalence between both sides of the formula so it would seem odd
 to treat the two sides of the formula differently. At least to me ;-)

It seems that I may be responsible for one of these extensions (lhs as 
a data.frame in cca and rda in vegan package). There the response (lhs) 
is multivariate or a multispecies community, and you must take that as 
a whole without manipulation (and if you tried using VGAM you see there 
really is painful to define lhs with, say, 127 elements). However, in 
general you shouldn't use models where you use all the 'explanatory' 
variables (rhs) that yo happen to have by accident. So much bad science 
has been created with that approach even in your field, Gav. The whole 
idea of formula is the ability to choose from candidate variables. That 
is: to build a model. Therefore you have one-sided formulae in prcomp() 
and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or 
prcomp(~ . - x3, data). I think you should try to keep it so. Do 
instead like Gabor suggested: you could have a function coca.default or 
coca.matrix with interface:

coca.matrix(matx, maty, matz) -- or you can name this as coca.default.

and coca.formula which essentially parses your formula and returns a 
list of matrices you need:

coca.formula - function(formula, data)
{
  matricesout - parsemyformula(formula, data)
 coca(matricesout$matx, matricesout$maty, matricesoutz)
}
Then you need the generic: coca - function(...) UseMethod(coca) and 
it's done (but fails in R CMD check unless you add ... in all 
specific functions...). The real work is always done in coca.matrix (or 
coca.default), and the others just chew your data into suitable form 
for your workhorse.

If then somebody thinks that they need all possible variables as 
'explanatory' variables (or perhaps constraints in your case), they 
just call the function as

coca(matx, maty, matz)

And if you have coca.data.frame they don't need 'quacking' with extra 
steps:

coca.data.frame - function(dfx, dfy dfz) coca(as.matrix(dfx), 
as.matrix(dfy), as.matrix(dfz)).

This you call as coca(dfx, dfy, dfz) and there you go.

The essential feature in formula is the ability to define the model. 
Don't give it away.

cheers, jazza
--
Jari Oksanen