Re: [R-pkg-devel] Different unit test results on MacOS

Louise McMillan Tue, 03 Mar 2026 15:26:16 -0800

Hi Simon and Hugh,

That's extremely helpful, thank you for all the info. I have found a better 
solution for those tests in the meantime � they are to test the reordering of 
the clustering output, so the output doesn't need to have converged, thus all 
the warnings because the clustering is just running briefly to produce any 
output. Your info will be very helpful for my further investigation into the 
variation between runs and between OSs.

I agree that the algorithm is likely to be sensitive to small differences at 
the start, but when it's running in the proper mode to get good output, rather 
than just running briefly for test output, it chooses the best of many 
different starting points before working towards a good solution, so the 
variation at the start then has less of an impact.

Thanks
Louise
________________________________
From: Simon Urbanek <[email protected]>
Sent: Wednesday, 4 March 2026 11:01 am
To: Louise McMillan <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [R-pkg-devel] Different unit test results on MacOS

Louise,

TL;DR this is not macOS specific - your test example is chaotic and thus will 
be influenced even by small changes in the precision beyond what is guaranteed, 
i.e. your assumptions are not generally valid and thus the tests don�t work.

The full story: floating point operation are deterministic, but may not yield 
the same results if varying precision is used. In most cases, the default 
precision is double-precision (53-bit significand) which is guaranteed to work 
on all CPUs, but R has the option to use extended precision (long double) for 
some operations (like accumulators in sums/means) if available which can be 
anywhere from 53 to 113-bits. Common CPUs derived from the Intel FPU use 64-bit 
significand which is only slightly more than double precision, but enough to 
make a difference in some cases. The arm CPUs used by Macs only support double 
precision, so any operations that are otherwise preformed with extended 
precision will be different. The differences will be very small, but your 
algorithm seems to be extremely sensitive to those - I think you're just 
testing the wrong thing (the output even says that it doesn't converge) since 
the result is chaotic (i.e. small changes have huge impact), not something you 
want to test.

To check what precision your R has, have a look at .Machine$longdouble.digits 
which will give you the precision of long doubles or NULL if there is no long 
double support in that build of R (given your results I bet you are using 
Intel-based CPU with 64-bit precision). You can check if your code is overly 
sensitive on your machine if you compile R with --disable-long-double which 
will make R only use double precision and then run your code - it does produce 
very different results on your example.

Cheers,
Simon

> On 3/03/2026, at 11:36, Louise McMillan <[email protected]> wrote:
>
> Hi,
>
> My package is called clustord (Github latest version at 
> github.com/vuw-clustering/clustord, and version 2.0.0 pushed to CRAN 
> yesterday 2nd March 2026.)
>
> I have an odd problem: I added a function to my package and an extensive set 
> of unit tests for it, and the unit tests run correctly on Windows and Linux, 
> but half of one single file out of the three test files runs differently on 
> MacOS than it does on Windows and Linux, and fails the tests.
>
> The package is a clustering package, and the new function is designed to be 
> able to reorder the output clusters in order of their cluster effect sizes. 
> The unit tests run the clustering algorithm and then the reorder function on 
> a simulated dataset and then check the output orderings against what I've 
> manually worked out the ordering should be.
>
> The dataset simulation process uses randomness, and the clustering algorithm 
> uses randomness, but the reordering does not. The start of each section of 
> the test script is set.seed(), in order to ensure the dataset is always the 
> same, and then that seed should also fix the output of the clustering 
> algorithm that runs just after the dataset simulation. So therefore the 
> results should always be the same on all operating systems. This is why I'm 
> so puzzled that almost all of the different versions of this test work on 
> MacOS as on Windows and Linux, but this particular version of the test runs 
> differently on MacOS, even though I set the seed at the start of simulating 
> the dataset for this specific test run.
>
> Since I do not have a Mac, it is difficult for me to debug it, though I can 
> see the error when the push to Github triggers the Github Actions check, 
> which runs on multiple OSs.
>
> The start of the section of the test script that's failing is:
>
> ------------------------------
>    library(clustord)
>    ## Dataset simulation
>    set.seed(30)
>    n <- 30
>    p <- 5
>    long_df_sim <- data.frame(Y=factor(sample(1:3,n*p,replace=TRUE)),
>                              ROW=rep(1:n,times=p),COL=rep(1:p,each=n))
>
>    xr1 <- runif(n, min=0, max=2)
>    xr2 <- sample(c("A","B"),size=n, replace=TRUE, prob=c(0.3,0.7))
>    xr3 <- factor(sample(1:4, size=n, replace=TRUE))
>
>    xc1 <- runif(p, min=-1, max=1)
>
>    long_df_sim$xr1 <- rep(xr1, times=5)
>    long_df_sim$xr2 <- rep(xr2, times=5)
>    long_df_sim$xr3 <- rep(xr3, times=5)
>    long_df_sim$xc1 <- rep(xc1, each=30)
>
>    ## Clustering algorithm
>    # OSM results 
> --------------------------------------------------------------
>    ## Model 1 ----
>    orig <- clustord(Y~ROWCLUST*xr1+xr2*xr3+COL, model="OSM", RG=4,
>                     long_df=long_df_sim, nstarts=1, constraint_sum_zero = 
> FALSE,
>                     
> control_EM=list(maxiter=3,maxiter_start=2,keep_all_params=TRUE))
> ------------------------------
>
> This section is just the dataset simulation and the clustering algorithm. The 
> reordering checks afterwards are failing, but I think it's more likely it's 
> because the clustering algorithm is somehow producing a different result on 
> the Mac than because the reordering (which is deterministic) is somehow 
> producing a different result on the Mac.
>
> Iff you display orig$out_parlist after running the above code, I expect the 
> $rowc values to be
>
> $rowc
>     rowc_1      rowc_2      rowc_3      rowc_4
> 0.00000000  0.08713465 -0.26123294  0.05820879
>
> I will keep investigating this myself, but if anyone has any suggestions why 
> the randomness might be working slightly differently on the Mac, or any other 
> possible causes for occasional mismatches between MacOS and other OSs, I 
> would really appreciate reading them.
>
> Thanks very much
> Louise
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel<https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] Different unit test results on MacOS

Reply via email to