Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-18 Thread Jari Oksanen
I am afraid that these suggestions may not work. There are more choices than 
Win32 and Win64, including several flavours of BLAS/Lapack which probably are 
involved if you evaluate eigenvalues, and also differences in hardware, 
compilers and phase of the moon.  If there are several equal eigenvalues, any 
solution of axes is arbitrary and it can be made stable for testing only by 
chance. If you have M equal eigenvalues, you should try to find a test that the 
M-dimensional (sub)space is approximately correct irrespective of random 
orientation of axes in this subspace.

Cheers, Jari Oksanen

On 18 May 2018, at 00:06 am, Kevin Coombes 
> wrote:

Yes; but I have been running around all day without time to sit down and
try them. The suggestions make sense, and I'm looking forward to
implementing them.

On Thu, May 17, 2018, 3:55 PM Ben Bolker 
> wrote:

There have been various comments in this thread (by me, and I think
Duncan Murdoch) about how you can identify the platform you're running
on (some combination of .Platform and/or R.Version()) and use it to
write conditional statements so that your tests will only be compared
with reference values that were generated on the same platform ... did
those get through?  Did they make sense?

On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
> wrote:
Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
the issue. The matrices I am using are all nonsingular, and the various
algorithms have no problem computing the eigenvalues correctly (up to
numerical errors that I can bound and thus account for on tests by
rounding
appropriately). But an eigenvalue of multiplicity M has an M-dimensional
eigenspace with no preferred basis. So, any M-dimensional  (unitary)
change
of basis is permitted. That's what give rise to the lack of
reproducibility
across architectures. The choice of basis appears to use different
heuristics on 32-bit windows than on 64-bit Windows or Linux machines.
As a
result, I can't include the tests I'd like as part of a CRAN submission.

On Thu, May 17, 2018, 2:29 PM William Dunlap 
> wrote:

Your explanation needs to be a bit more general in the case of identical
eigenvalues - each distinct eigenvalue has an associated subspace, whose
dimension is the number repeats of that eigenvalue and the eigenvectors
for
that eigenvalue are an orthonormal basis for that subspace.  (With no
repeated eigenvalues this gives your 'unique up to sign'.)

E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of
0

x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
x
  [,1] [,2] [,3] [,4] [,5]
 [1,]10001
 [2,]01001
 [3,]00101
 [4,]00000
 [5,]11103
the following give valid but different (by more than sign) eigen vectors

e1 <- structure(list(values = c(4, 1, 0.999, 0,
-2.22044607159862e-16
), vectors = structure(c(-0.288675134594813, -0.288675134594813,
-0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
-0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
-0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
-0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
"vectors"), class = "eigen")
e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
   vectors = structure(c(0.288675134594813, 0.288675134594813,
   0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
   0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
   0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
   -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
   0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
), class = "eigen")

I.e.,
all.equal(crossprod(e1$vectors), diag(5), tol=0)
[1] "Mean relative difference: 1.407255e-15"
all.equal(crossprod(e2$vectors), diag(5), tol=0)
[1] "Mean relative difference: 3.856478e-15"
all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
[1] "Mean relative difference: 1.110223e-15"
all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
[1] "Mean relative difference: 9.069735e-16"

e1$vectors
  [,1]   [,2]  [,3] [,4] [,5]
[1,] -0.2886751  0.000  8.164966e-010 -0.5
[2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
[3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
[4,]  0.000  0.000  0.00e+00   -1  0.0
[5,] -0.8660254  0.000 -6.106227e-160  0.5
e2$vectors
 [,1]  [,2]  [,3] [,4] [,5]
[1,] 0.2886751 -7.844376e-01  2.265489e-010 -0.5
[2,] 0.2886751  5.884158e-01  5.660684e-010 -0.5
[3,] 0.2886751  1.960217e-01 -7.926173e-010 -0.5
[4,] 0.000  0.00e+00  0.00e+00   -1  0.0
[5,] 0.8660254  

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-18 Thread Facundo Muñoz
In my opinion, the underlying problem is that you are checking whether
the test reproduces exactly your pre-computed solution, while there
actually exist other valid answers.

I believe you want to check whether the sub-spaces are the same, not
whether the bases are identical (which can depend on platform, linear
algebra library, etc.)

ƒacu.-



On 05/17/2018 09:30 PM, Kevin Coombes wrote:
> Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
> the issue. The matrices I am using are all nonsingular, and the various
> algorithms have no problem computing the eigenvalues correctly (up to
> numerical errors that I can bound and thus account for on tests by rounding
> appropriately). But an eigenvalue of multiplicity M has an M-dimensional
> eigenspace with no preferred basis. So, any M-dimensional  (unitary) change
> of basis is permitted. That's what give rise to the lack of reproducibility
> across architectures. The choice of basis appears to use different
> heuristics on 32-bit windows than on 64-bit Windows or Linux machines. As a
> result, I can't include the tests I'd like as part of a CRAN submission.
>
> On Thu, May 17, 2018, 2:29 PM William Dunlap  wrote:
>
>> Your explanation needs to be a bit more general in the case of identical
>> eigenvalues - each distinct eigenvalue has an associated subspace, whose
>> dimension is the number repeats of that eigenvalue and the eigenvectors for
>> that eigenvalue are an orthonormal basis for that subspace.  (With no
>> repeated eigenvalues this gives your 'unique up to sign'.)
>>
>> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of 0
>>
>>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
>>   > x
>>[,1] [,2] [,3] [,4] [,5]
>>   [1,]10001
>>   [2,]01001
>>   [3,]00101
>>   [4,]00000
>>   [5,]11103
>> the following give valid but different (by more than sign) eigen vectors
>>
>> e1 <- structure(list(values = c(4, 1, 0.999, 0,
>> -2.22044607159862e-16
>> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
>> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
>> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
>> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
>> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
>> "vectors"), class = "eigen")
>> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
>> vectors = structure(c(0.288675134594813, 0.288675134594813,
>> 0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
>> 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
>> 0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
>> -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
>> 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
>> ), class = "eigen")
>>
>> I.e.,
>>> all.equal(crossprod(e1$vectors), diag(5), tol=0)
>> [1] "Mean relative difference: 1.407255e-15"
>>> all.equal(crossprod(e2$vectors), diag(5), tol=0)
>> [1] "Mean relative difference: 3.856478e-15"
>>> all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
>> [1] "Mean relative difference: 1.110223e-15"
>>> all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
>> [1] "Mean relative difference: 9.069735e-16"
>>
>>> e1$vectors
>>[,1]   [,2]  [,3] [,4] [,5]
>> [1,] -0.2886751  0.000  8.164966e-010 -0.5
>> [2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
>> [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
>> [4,]  0.000  0.000  0.00e+00   -1  0.0
>> [5,] -0.8660254  0.000 -6.106227e-160  0.5
>>> e2$vectors
>>   [,1]  [,2]  [,3] [,4] [,5]
>> [1,] 0.2886751 -7.844376e-01  2.265489e-010 -0.5
>> [2,] 0.2886751  5.884158e-01  5.660684e-010 -0.5
>> [3,] 0.2886751  1.960217e-01 -7.926173e-010 -0.5
>> [4,] 0.000  0.00e+00  0.00e+00   -1  0.0
>> [5,] 0.8660254  4.464109e-17 -1.112441e-160  0.5
>>
>>
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
>> maech...@stat.math.ethz.ch> wrote:
>>
 Duncan Murdoch 
 on Thu, 17 May 2018 12:13:01 -0400 writes:
>>> > On 17/05/2018 11:53 AM, Martin Maechler wrote:
>>> >>> Kevin Coombes ... on Thu, 17
>>> >>> May 2018 11:21:23 -0400 writes:
>>>
>>> >>[..]
>>>
>>> >> > [3] Should the documentation (man page) for "eigen" or
>>> >> > "mvrnorm" include a warning that the results can change
>>> >> > from machine to machine (or between things like 32-bit and
>>> >> > 64-bit R on the same machine) because of difference in
>>> >> > linear algebra modules? (Possibly including the statement
>>> >> > that "set.seed" won't save you.)
>>>
>>> 

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-18 Thread Martin Maechler
> William Dunlap 
> on Thu, 17 May 2018 11:28:50 -0700 writes:

> Your explanation needs to be a bit more general in the
> case of identical eigenvalues - each distinct eigenvalue
> has an associated subspace, whose dimension is the number
> repeats of that eigenvalue and the eigenvectors for that
> eigenvalue are an orthonormal basis for that subspace.
> (With no repeated eigenvalues this gives your 'unique up
> to sign'.)

Thank you, Bill, notably for the concrete example of non-trivial
eigenspaces (per eigenvector). 
Note I did say

  "... such that at least for the good cases where all eigenspaces
   are 1-dimensional, ..."

knowing well that only in that case it "is easy".
I have a gut feeling but may be wrong that such simplistic post
processing may also help (to get cross-platform reproducibility)
in the case of MASS::mvrnorm() where repeated eigenvalues will
be common in practice.

Martin

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread William Dunlap
But why would you want to tie your tests to specific platforms, if
mathematically all those results are equivalent?  You could compare the
orthogonal complements from a full rank matrix (say the identity) to each
expected eigenspace.  E.g., for the example I gave above, where e2 and e1
gave different basis for 2 eigenspaces:

> all.equal(tol=0,
+   residuals(lm.fit(y=diag(5), e1$vectors[,1,drop=FALSE])),
+   residuals(lm.fit(y=diag(5), e2$vectors[,1,drop=FALSE])))
[1] "Mean relative difference: 4.791489e-16"
> all.equal(tol=0,
+   residuals(lm.fit(y=diag(5), e1$vectors[,2:3,drop=FALSE])),
+   residuals(lm.fit(y=diag(5), e2$vectors[,2:3,drop=FALSE])))
[1] "Mean relative difference: 1.469377e-15"
> all.equal(tol=0,
+   residuals(lm.fit(y=diag(5), e1$vectors[,4:5,drop=FALSE])),
+   residuals(lm.fit(y=diag(5), e2$vectors[,4:5,drop=FALSE])))
[1] TRUE


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, May 17, 2018 at 12:55 PM, Ben Bolker  wrote:

> There have been various comments in this thread (by me, and I think
> Duncan Murdoch) about how you can identify the platform you're running
> on (some combination of .Platform and/or R.Version()) and use it to
> write conditional statements so that your tests will only be compared
> with reference values that were generated on the same platform ... did
> those get through?  Did they make sense?
>
> On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
>  wrote:
> > Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
> > the issue. The matrices I am using are all nonsingular, and the various
> > algorithms have no problem computing the eigenvalues correctly (up to
> > numerical errors that I can bound and thus account for on tests by
> rounding
> > appropriately). But an eigenvalue of multiplicity M has an M-dimensional
> > eigenspace with no preferred basis. So, any M-dimensional  (unitary)
> change
> > of basis is permitted. That's what give rise to the lack of
> reproducibility
> > across architectures. The choice of basis appears to use different
> > heuristics on 32-bit windows than on 64-bit Windows or Linux machines.
> As a
> > result, I can't include the tests I'd like as part of a CRAN submission.
> >
> > On Thu, May 17, 2018, 2:29 PM William Dunlap  wrote:
> >
> >> Your explanation needs to be a bit more general in the case of identical
> >> eigenvalues - each distinct eigenvalue has an associated subspace, whose
> >> dimension is the number repeats of that eigenvalue and the eigenvectors
> for
> >> that eigenvalue are an orthonormal basis for that subspace.  (With no
> >> repeated eigenvalues this gives your 'unique up to sign'.)
> >>
> >> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of
> 0
> >>
> >>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
> >>   > x
> >>[,1] [,2] [,3] [,4] [,5]
> >>   [1,]10001
> >>   [2,]01001
> >>   [3,]00101
> >>   [4,]00000
> >>   [5,]11103
> >> the following give valid but different (by more than sign) eigen vectors
> >>
> >> e1 <- structure(list(values = c(4, 1, 0.999, 0,
> >> -2.22044607159862e-16
> >> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
> >> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
> >> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
> >> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
> >> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
> >> "vectors"), class = "eigen")
> >> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
> >> vectors = structure(c(0.288675134594813, 0.288675134594813,
> >> 0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
> >> 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
> >> 0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
> >> -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
> >> 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
> >> ), class = "eigen")
> >>
> >> I.e.,
> >> > all.equal(crossprod(e1$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 1.407255e-15"
> >> > all.equal(crossprod(e2$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 3.856478e-15"
> >> > all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
> >> [1] "Mean relative difference: 1.110223e-15"
> >> > all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
> >> [1] "Mean relative difference: 9.069735e-16"
> >>
> >> > e1$vectors
> >>[,1]   [,2]  [,3] [,4] [,5]
> >> [1,] -0.2886751  0.000  8.164966e-010 -0.5
> >> [2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
> >> [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
> >> [4,]  0.000  0.000  0.00e+00   -1  0.0
> >> [5,] -0.8660254  0.000 

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Kevin Coombes
Yes; but I have been running around all day without time to sit down and
try them. The suggestions make sense, and I'm looking forward to
implementing them.

On Thu, May 17, 2018, 3:55 PM Ben Bolker  wrote:

> There have been various comments in this thread (by me, and I think
> Duncan Murdoch) about how you can identify the platform you're running
> on (some combination of .Platform and/or R.Version()) and use it to
> write conditional statements so that your tests will only be compared
> with reference values that were generated on the same platform ... did
> those get through?  Did they make sense?
>
> On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
>  wrote:
> > Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
> > the issue. The matrices I am using are all nonsingular, and the various
> > algorithms have no problem computing the eigenvalues correctly (up to
> > numerical errors that I can bound and thus account for on tests by
> rounding
> > appropriately). But an eigenvalue of multiplicity M has an M-dimensional
> > eigenspace with no preferred basis. So, any M-dimensional  (unitary)
> change
> > of basis is permitted. That's what give rise to the lack of
> reproducibility
> > across architectures. The choice of basis appears to use different
> > heuristics on 32-bit windows than on 64-bit Windows or Linux machines.
> As a
> > result, I can't include the tests I'd like as part of a CRAN submission.
> >
> > On Thu, May 17, 2018, 2:29 PM William Dunlap  wrote:
> >
> >> Your explanation needs to be a bit more general in the case of identical
> >> eigenvalues - each distinct eigenvalue has an associated subspace, whose
> >> dimension is the number repeats of that eigenvalue and the eigenvectors
> for
> >> that eigenvalue are an orthonormal basis for that subspace.  (With no
> >> repeated eigenvalues this gives your 'unique up to sign'.)
> >>
> >> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of
> 0
> >>
> >>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
> >>   > x
> >>[,1] [,2] [,3] [,4] [,5]
> >>   [1,]10001
> >>   [2,]01001
> >>   [3,]00101
> >>   [4,]00000
> >>   [5,]11103
> >> the following give valid but different (by more than sign) eigen vectors
> >>
> >> e1 <- structure(list(values = c(4, 1, 0.999, 0,
> >> -2.22044607159862e-16
> >> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
> >> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
> >> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
> >> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
> >> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
> >> "vectors"), class = "eigen")
> >> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
> >> vectors = structure(c(0.288675134594813, 0.288675134594813,
> >> 0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
> >> 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
> >> 0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
> >> -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
> >> 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
> >> ), class = "eigen")
> >>
> >> I.e.,
> >> > all.equal(crossprod(e1$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 1.407255e-15"
> >> > all.equal(crossprod(e2$vectors), diag(5), tol=0)
> >> [1] "Mean relative difference: 3.856478e-15"
> >> > all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
> >> [1] "Mean relative difference: 1.110223e-15"
> >> > all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
> >> [1] "Mean relative difference: 9.069735e-16"
> >>
> >> > e1$vectors
> >>[,1]   [,2]  [,3] [,4] [,5]
> >> [1,] -0.2886751  0.000  8.164966e-010 -0.5
> >> [2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
> >> [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
> >> [4,]  0.000  0.000  0.00e+00   -1  0.0
> >> [5,] -0.8660254  0.000 -6.106227e-160  0.5
> >> > e2$vectors
> >>   [,1]  [,2]  [,3] [,4] [,5]
> >> [1,] 0.2886751 -7.844376e-01  2.265489e-010 -0.5
> >> [2,] 0.2886751  5.884158e-01  5.660684e-010 -0.5
> >> [3,] 0.2886751  1.960217e-01 -7.926173e-010 -0.5
> >> [4,] 0.000  0.00e+00  0.00e+00   -1  0.0
> >> [5,] 0.8660254  4.464109e-17 -1.112441e-160  0.5
> >>
> >>
> >>
> >>
> >>
> >> Bill Dunlap
> >> TIBCO Software
> >> wdunlap tibco.com
> >>
> >> On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
> >> maech...@stat.math.ethz.ch> wrote:
> >>
> >>> > Duncan Murdoch 
> >>> > on Thu, 17 May 2018 12:13:01 -0400 writes:
> >>>
> >>> > On 17/05/2018 11:53 AM, Martin Maechler wrote:
> >>> 

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Ben Bolker
There have been various comments in this thread (by me, and I think
Duncan Murdoch) about how you can identify the platform you're running
on (some combination of .Platform and/or R.Version()) and use it to
write conditional statements so that your tests will only be compared
with reference values that were generated on the same platform ... did
those get through?  Did they make sense?

On Thu, May 17, 2018 at 3:30 PM, Kevin Coombes
 wrote:
> Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
> the issue. The matrices I am using are all nonsingular, and the various
> algorithms have no problem computing the eigenvalues correctly (up to
> numerical errors that I can bound and thus account for on tests by rounding
> appropriately). But an eigenvalue of multiplicity M has an M-dimensional
> eigenspace with no preferred basis. So, any M-dimensional  (unitary) change
> of basis is permitted. That's what give rise to the lack of reproducibility
> across architectures. The choice of basis appears to use different
> heuristics on 32-bit windows than on 64-bit Windows or Linux machines. As a
> result, I can't include the tests I'd like as part of a CRAN submission.
>
> On Thu, May 17, 2018, 2:29 PM William Dunlap  wrote:
>
>> Your explanation needs to be a bit more general in the case of identical
>> eigenvalues - each distinct eigenvalue has an associated subspace, whose
>> dimension is the number repeats of that eigenvalue and the eigenvectors for
>> that eigenvalue are an orthonormal basis for that subspace.  (With no
>> repeated eigenvalues this gives your 'unique up to sign'.)
>>
>> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of 0
>>
>>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
>>   > x
>>[,1] [,2] [,3] [,4] [,5]
>>   [1,]10001
>>   [2,]01001
>>   [3,]00101
>>   [4,]00000
>>   [5,]11103
>> the following give valid but different (by more than sign) eigen vectors
>>
>> e1 <- structure(list(values = c(4, 1, 0.999, 0,
>> -2.22044607159862e-16
>> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
>> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
>> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
>> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
>> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
>> "vectors"), class = "eigen")
>> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
>> vectors = structure(c(0.288675134594813, 0.288675134594813,
>> 0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
>> 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
>> 0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
>> -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
>> 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
>> ), class = "eigen")
>>
>> I.e.,
>> > all.equal(crossprod(e1$vectors), diag(5), tol=0)
>> [1] "Mean relative difference: 1.407255e-15"
>> > all.equal(crossprod(e2$vectors), diag(5), tol=0)
>> [1] "Mean relative difference: 3.856478e-15"
>> > all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
>> [1] "Mean relative difference: 1.110223e-15"
>> > all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
>> [1] "Mean relative difference: 9.069735e-16"
>>
>> > e1$vectors
>>[,1]   [,2]  [,3] [,4] [,5]
>> [1,] -0.2886751  0.000  8.164966e-010 -0.5
>> [2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
>> [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
>> [4,]  0.000  0.000  0.00e+00   -1  0.0
>> [5,] -0.8660254  0.000 -6.106227e-160  0.5
>> > e2$vectors
>>   [,1]  [,2]  [,3] [,4] [,5]
>> [1,] 0.2886751 -7.844376e-01  2.265489e-010 -0.5
>> [2,] 0.2886751  5.884158e-01  5.660684e-010 -0.5
>> [3,] 0.2886751  1.960217e-01 -7.926173e-010 -0.5
>> [4,] 0.000  0.00e+00  0.00e+00   -1  0.0
>> [5,] 0.8660254  4.464109e-17 -1.112441e-160  0.5
>>
>>
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
>> maech...@stat.math.ethz.ch> wrote:
>>
>>> > Duncan Murdoch 
>>> > on Thu, 17 May 2018 12:13:01 -0400 writes:
>>>
>>> > On 17/05/2018 11:53 AM, Martin Maechler wrote:
>>> >>> Kevin Coombes ... on Thu, 17
>>> >>> May 2018 11:21:23 -0400 writes:
>>>
>>> >>[..]
>>>
>>> >> > [3] Should the documentation (man page) for "eigen" or
>>> >> > "mvrnorm" include a warning that the results can change
>>> >> > from machine to machine (or between things like 32-bit and
>>> >> > 64-bit R on the same machine) because of difference in
>>> >> > linear algebra 

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Kevin Coombes
Yes; I'm pretty sure that it is exactly the repeated eigenvalues that are
the issue. The matrices I am using are all nonsingular, and the various
algorithms have no problem computing the eigenvalues correctly (up to
numerical errors that I can bound and thus account for on tests by rounding
appropriately). But an eigenvalue of multiplicity M has an M-dimensional
eigenspace with no preferred basis. So, any M-dimensional  (unitary) change
of basis is permitted. That's what give rise to the lack of reproducibility
across architectures. The choice of basis appears to use different
heuristics on 32-bit windows than on 64-bit Windows or Linux machines. As a
result, I can't include the tests I'd like as part of a CRAN submission.

On Thu, May 17, 2018, 2:29 PM William Dunlap  wrote:

> Your explanation needs to be a bit more general in the case of identical
> eigenvalues - each distinct eigenvalue has an associated subspace, whose
> dimension is the number repeats of that eigenvalue and the eigenvectors for
> that eigenvalue are an orthonormal basis for that subspace.  (With no
> repeated eigenvalues this gives your 'unique up to sign'.)
>
> E.g., for the following 5x5 matrix with two eigenvalues of 1 and two of 0
>
>   > x <- tcrossprod( cbind(c(1,0,0,0,1),c(0,1,0,0,1),c(0,0,1,0,1)) )
>   > x
>[,1] [,2] [,3] [,4] [,5]
>   [1,]10001
>   [2,]01001
>   [3,]00101
>   [4,]00000
>   [5,]11103
> the following give valid but different (by more than sign) eigen vectors
>
> e1 <- structure(list(values = c(4, 1, 0.999, 0,
> -2.22044607159862e-16
> ), vectors = structure(c(-0.288675134594813, -0.288675134594813,
> -0.288675134594813, 0, -0.866025403784439, 0, 0.707106781186547,
> -0.707106781186547, 0, 0, 0.816496580927726, -0.408248290463863,
> -0.408248290463863, 0, -6.10622663543836e-16, 0, 0, 0, -1, 0,
> -0.5, -0.5, -0.5, 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values",
> "vectors"), class = "eigen")
> e2 <- structure(list(values = c(4, 1, 1, 0, -2.29037708937563e-16),
> vectors = structure(c(0.288675134594813, 0.288675134594813,
> 0.288675134594813, 0, 0.866025403784438, -0.784437556312061,
> 0.588415847923579, 0.196021708388481, 0, 4.46410900710223e-17,
> 0.22654886208902, 0.566068420404321, -0.79261728249334, 0,
> -1.11244069540181e-16, 0, 0, 0, -1, 0, -0.5, -0.5, -0.5,
> 0, 0.5), .Dim = c(5L, 5L))), .Names = c("values", "vectors"
> ), class = "eigen")
>
> I.e.,
> > all.equal(crossprod(e1$vectors), diag(5), tol=0)
> [1] "Mean relative difference: 1.407255e-15"
> > all.equal(crossprod(e2$vectors), diag(5), tol=0)
> [1] "Mean relative difference: 3.856478e-15"
> > all.equal(e1$vectors %*% diag(e1$values) %*% t(e1$vectors), x, tol=0)
> [1] "Mean relative difference: 1.110223e-15"
> > all.equal(e2$vectors %*% diag(e2$values) %*% t(e2$vectors), x, tol=0)
> [1] "Mean relative difference: 9.069735e-16"
>
> > e1$vectors
>[,1]   [,2]  [,3] [,4] [,5]
> [1,] -0.2886751  0.000  8.164966e-010 -0.5
> [2,] -0.2886751  0.7071068 -4.082483e-010 -0.5
> [3,] -0.2886751 -0.7071068 -4.082483e-010 -0.5
> [4,]  0.000  0.000  0.00e+00   -1  0.0
> [5,] -0.8660254  0.000 -6.106227e-160  0.5
> > e2$vectors
>   [,1]  [,2]  [,3] [,4] [,5]
> [1,] 0.2886751 -7.844376e-01  2.265489e-010 -0.5
> [2,] 0.2886751  5.884158e-01  5.660684e-010 -0.5
> [3,] 0.2886751  1.960217e-01 -7.926173e-010 -0.5
> [4,] 0.000  0.00e+00  0.00e+00   -1  0.0
> [5,] 0.8660254  4.464109e-17 -1.112441e-160  0.5
>
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, May 17, 2018 at 10:14 AM, Martin Maechler <
> maech...@stat.math.ethz.ch> wrote:
>
>> > Duncan Murdoch 
>> > on Thu, 17 May 2018 12:13:01 -0400 writes:
>>
>> > On 17/05/2018 11:53 AM, Martin Maechler wrote:
>> >>> Kevin Coombes ... on Thu, 17
>> >>> May 2018 11:21:23 -0400 writes:
>>
>> >>[..]
>>
>> >> > [3] Should the documentation (man page) for "eigen" or
>> >> > "mvrnorm" include a warning that the results can change
>> >> > from machine to machine (or between things like 32-bit and
>> >> > 64-bit R on the same machine) because of difference in
>> >> > linear algebra modules? (Possibly including the statement
>> >> > that "set.seed" won't save you.)
>>
>> >> The problem is that most (young?) people do not read help
>> >> pages anymore.
>> >>
>> >> help(eigen) has contained the following text for years,
>> >> and in spite of your good analysis of the problem you
>> >> seem to not have noticed the last semi-paragraph:
>> >>
>> >>> Value:
>> >>>
>> >>> The spectral decomposition of ‘x’ is returned as a list
>> >>> with components
>> >>>
>> >>> values: a vector containing the p eigenvalues of 

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Martin Maechler
> Duncan Murdoch 
> on Thu, 17 May 2018 12:13:01 -0400 writes:

> On 17/05/2018 11:53 AM, Martin Maechler wrote:
>>> Kevin Coombes ... on Thu, 17
>>> May 2018 11:21:23 -0400 writes:

>>[..]

>> > [3] Should the documentation (man page) for "eigen" or
>> > "mvrnorm" include a warning that the results can change
>> > from machine to machine (or between things like 32-bit and
>> > 64-bit R on the same machine) because of difference in
>> > linear algebra modules? (Possibly including the statement 
>> > that "set.seed" won't save you.)

>> The problem is that most (young?) people do not read help
>> pages anymore.
>> 
>> help(eigen) has contained the following text for years,
>> and in spite of your good analysis of the problem you
>> seem to not have noticed the last semi-paragraph:
>> 
>>> Value:
>>> 
>>> The spectral decomposition of ‘x’ is returned as a list
>>> with components
>>> 
>>> values: a vector containing the p eigenvalues of ‘x’,
>>> sorted in _decreasing_ order, according to ‘Mod(values)’
>>> in the asymmetric case when they might be complex (even
>>> for real matrices).  For real asymmetric matrices the
>>> vector will be complex only if complex conjugate pairs
>>> of eigenvalues are detected.
>>> 
>>> vectors: either a p * p matrix whose columns contain the
>>> eigenvectors of ‘x’, or ‘NULL’ if ‘only.values’ is
>>> ‘TRUE’.  The vectors are normalized to unit length.
>>> 
>>> Recall that the eigenvectors are only defined up to a
>>> constant: even when the length is specified they are
>>> still only defined up to a scalar of modulus one (the
>>> sign for real matrices).
>> 
>> It's not a warning but a "recall that" .. maybe because
>> the author already assumed that only thorough users would
>> read that and for them it would be a recall of something
>> they'd have learned *and* not entirely forgotten since
>> ;-)
>> 

> I don't think you're really being fair here: the text in
> ?eigen doesn't make clear that eigenvector values are not
> reproducible even within the same version of R, and
> there's nothing in ?mvrnorm to suggest it doesn't give
> reproducible results.

Ok, I'm sorry ... I definitely did not want to be unfair.

I've always thought the remark in eigen was sufficient,  but I'm
probably wrong and we should add text explaining that it
practically means that eigenvectors are only defined up to sign
switches (in the real case) and hence results depend on the
underlying {Lapack + BLAS} libraries and therefore are platform
dependent.

Even further, we could consider (optionally, by default FALSE)
using defining a deterministic scheme for postprocessing the current
output of eigen such that at least for the good cases where all
eigenspaces are 1-dimensional, the postprocessing would result
in reproducible signs, by e.g., ensuring the first non-zero
entry of each eigenvector to be positive.

MASS::mvrnorm()  and  mvtnorm::rmvnorm() both use "eigen",
whereas mvtnorm::rmvnorm()  *does* have  method = "chol" which
AFAIK does not suffer from such problems.

OTOH, the help page of MASS::mvrnorm() mentions the Cholesky
alternative but prefers eigen for better stability (without
saying more).

In spite of that, my personal recommendation would be to use

  mvtnorm::rmvnorm(.., method = "chol")

{ or the 2-3 lines of R code to the same thing without an extra package,
  just using rnorm(), chol() and simple matrix operations }

because in simulations I'd expect the var-cov matrix  Sigma to
be far enough away from singular for chol() to be stable.

Martin

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Duncan Murdoch

On 17/05/2018 11:53 AM, Martin Maechler wrote:

Kevin Coombes 
 on Thu, 17 May 2018 11:21:23 -0400 writes:


 > Hi, I wrote and maintain the Thresher package. It includes
 > code to do simulations. In the "tests" directory of the
 > package, I do some simple simulations and run the main
 > algorithm, then write out summaries of the results

 > The initial submission of the package to CRAN was delayed
 > because the "Rout.save" files matched the "Rout" files on
 > 64-bit R but *not* on 32-bit R on Windows. After
 > investigating, I realized that when my simulation code
 > called "MASS::mvrnorm", I got different results from
 > 64-bit and 32-bit versions of R on the same machine.
 > Pushing further, I determined that this was happening
 > because mvrnorm used "eigen" to compute the eigenvalues
 > and eigenvectors, and "eigen" itself gave different
 > answers in the two R versions..

 > The underlying issue (mathematically) is that the
 > correlation/covariance matrix I was using had repeated
 > eigenvalues, and so there is no unique choice of basis for
 > the associated eigenspace. This observation suggests that
 > the issue is potentially more general than 32-bit versus
 > 64-bit; the results will depend on the implementation of
 > the eigen-decomposition in whatever linear algebra module
 > is compiled along with R, so it can change from machine to
 > machine.

 > I "solved" (well, worked around) the immediate problem
 > with package submission by changing the test code to not
 > write out anything that might differ between versions.

 > With all of that as background, here are my main
 > questions:

 > [1] Is there any way to put something into the "tests"
 > directory that would allow me to use these simulations for
 > what computer scientists call regression testing? (That
 > is, to make sure my changes to the code haven't changed
 > results in an unexpected way.)

 > [2] Should there be a flag or instruction to R CMD check
 > that says to only run or interpret this particular test on
 > a specific version or machine? (Or is there already such a
 > flag that I don't know about?)

 > [3] Should the documentation (man page) for "eigen" or
 > "mvrnorm" include a warning that the results can change
 > from machine to machine (or between things like 32-bit and
 > 64-bit R on the same machine) because of difference in
 > linear algebra modules? (Possibly including the statement
 > that "set.seed" won't save you.)

The problem is that most (young?) people do not read help pages
anymore.

help(eigen) has contained the following text for years, and in
spite of your good analysis of the problem you seem to not have
noticed the last semi-paragraph:


Value:

  The spectral decomposition of ‘x’ is returned as a list with
  components

   values: a vector containing the p eigenvalues of ‘x’, sorted in
   _decreasing_ order, according to ‘Mod(values)’ in the
   asymmetric case when they might be complex (even for real
   matrices).  For real asymmetric matrices the vector will be
   complex only if complex conjugate pairs of eigenvalues are
   detected.

  vectors: either a p * p matrix whose columns contain the eigenvectors
   of ‘x’, or ‘NULL’ if ‘only.values’ is ‘TRUE’.  The vectors
   are normalized to unit length.

   Recall that the eigenvectors are only defined up to a
   constant: even when the length is specified they are still
   only defined up to a scalar of modulus one (the sign for real
   matrices).


It's not a warning but a "recall that" .. maybe because the
author already assumed that only thorough users would read that
and for them it would be a recall of something they'd have
learned *and* not entirely forgotten since ;-)



I don't think you're really being fair here:  the text in ?eigen doesn't 
make clear that eigenvector values are not reproducible even within the 
same version of R, and there's nothing in ?mvrnorm to suggest it doesn't 
give reproducible results.


As to the other questions:

[1] You can add additional test directories and only test them under 
your own controlled conditions:  see the --test-dir argument to R CMD 
check.  You could also make tests in the standard directory conditional 
on environment variables.


[2] You can find out details of the current machine using the .Platform 
and version variables, and make tests conditional on particular values 
of those.  I'd recommend limiting such tests to your own personal runs 
(using [1]) or not including saved output, because CRAN will run the 
tests on multiple platforms.


Duncan Murdoch

__
R-package-devel@r-project.org mailing list

Re: [R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Martin Maechler
> Kevin Coombes 
> on Thu, 17 May 2018 11:21:23 -0400 writes:

> Hi, I wrote and maintain the Thresher package. It includes
> code to do simulations. In the "tests" directory of the
> package, I do some simple simulations and run the main
> algorithm, then write out summaries of the results

> The initial submission of the package to CRAN was delayed
> because the "Rout.save" files matched the "Rout" files on
> 64-bit R but *not* on 32-bit R on Windows. After
> investigating, I realized that when my simulation code
> called "MASS::mvrnorm", I got different results from
> 64-bit and 32-bit versions of R on the same machine.
> Pushing further, I determined that this was happening
> because mvrnorm used "eigen" to compute the eigenvalues
> and eigenvectors, and "eigen" itself gave different
> answers in the two R versions..

> The underlying issue (mathematically) is that the
> correlation/covariance matrix I was using had repeated
> eigenvalues, and so there is no unique choice of basis for
> the associated eigenspace. This observation suggests that
> the issue is potentially more general than 32-bit versus
> 64-bit; the results will depend on the implementation of
> the eigen-decomposition in whatever linear algebra module
> is compiled along with R, so it can change from machine to
> machine.

> I "solved" (well, worked around) the immediate problem
> with package submission by changing the test code to not
> write out anything that might differ between versions.

> With all of that as background, here are my main
> questions:

> [1] Is there any way to put something into the "tests"
> directory that would allow me to use these simulations for
> what computer scientists call regression testing? (That
> is, to make sure my changes to the code haven't changed
> results in an unexpected way.)

> [2] Should there be a flag or instruction to R CMD check
> that says to only run or interpret this particular test on
> a specific version or machine? (Or is there already such a
> flag that I don't know about?)

> [3] Should the documentation (man page) for "eigen" or
> "mvrnorm" include a warning that the results can change
> from machine to machine (or between things like 32-bit and
> 64-bit R on the same machine) because of difference in
> linear algebra modules? (Possibly including the statement
> that "set.seed" won't save you.)

The problem is that most (young?) people do not read help pages
anymore.

help(eigen) has contained the following text for years, and in
spite of your good analysis of the problem you seem to not have
noticed the last semi-paragraph:

> Value:
> 
>  The spectral decomposition of ‘x’ is returned as a list with
>  components
> 
>   values: a vector containing the p eigenvalues of ‘x’, sorted in
>   _decreasing_ order, according to ‘Mod(values)’ in the
>   asymmetric case when they might be complex (even for real
>   matrices).  For real asymmetric matrices the vector will be
>   complex only if complex conjugate pairs of eigenvalues are
>   detected.
> 
>  vectors: either a p * p matrix whose columns contain the eigenvectors
>   of ‘x’, or ‘NULL’ if ‘only.values’ is ‘TRUE’.  The vectors
>   are normalized to unit length.
> 
>   Recall that the eigenvectors are only defined up to a
>   constant: even when the length is specified they are still
>   only defined up to a scalar of modulus one (the sign for real
>   matrices).

It's not a warning but a "recall that" .. maybe because the
author already assumed that only thorough users would read that
and for them it would be a recall of something they'd have
learned *and* not entirely forgotten since ;-)

Martin Maechler
ETH Zurich

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] mvrnorm, eigen, tests, and R CMD check

2018-05-17 Thread Kevin Coombes
Hi,

I wrote and maintain the Thresher package. It includes code to do
simulations. In the "tests" directory of the package, I do some simple
simulations and run the main algorithm, then write out summaries of the
results

The initial submission of the package to CRAN was delayed because the
"Rout.save" files matched the "Rout" files on 64-bit R but *not*  on 32-bit
R on Windows. After investigating, I realized that when my simulation code
called "MASS::mvrnorm", I got different results from 64-bit and 32-bit
versions of R on the same machine.  Pushing further, I determined that this
was happening because mvrnorm used "eigen" to compute the eigenvalues and
eigenvectors, and "eigen" itself gave different answers in the two R
versions..

The underlying issue (mathematically) is that the correlation/covariance
matrix I was using had repeated eigenvalues, and so there is no unique
choice of basis for the associated eigenspace. This observation suggests
that the issue is potentially more general than 32-bit versus 64-bit; the
results will depend on the implementation of the eigen-decomposition in
whatever linear algebra module is compiled along with R, so it can change
from machine to machine.

I "solved" (well, worked around) the immediate problem with package
submission by changing the test code to not write out anything that might
differ between versions.

With all of that as background, here are my main questions:

[1] Is there any way to put something into the "tests" directory that would
allow me to use these simulations for what computer scientists call
regression testing? (That is, to make sure my changes to the code haven't
changed results in an unexpected way.)

[2] Should there be a flag or instruction to R CMD check that says to only
run or interpret this particular test on a specific version or machine? (Or
is there already such a flag that I don't know about?)

[3] Should the documentation (man page) for "eigen" or "mvrnorm" include a
warning that the results can change from machine to machine (or between
things like 32-bit and 64-bit R on the same machine) because of difference
in linear algebra modules? (Possibly including the statement that
"set.seed" won't save you.)

You can reproduce my example by running this code in different R
versions/machines:

sig <- matrix(0, 16, 16)
sig[1:10, 1:10] <- 0.5
sig[11:16,11:16] <- 0.5
diag(sig) <- 1
eigen(sig)

Best,
  Kevin

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel