Thank you, Uwe for tagging me in. Huge thank you to Ivan for the detailed analysis. This reflects what Max Coulter found in https://github.com/thibautjombart/adegenet/issues/363. I just merged in https://github.com/thibautjombart/adegenet/issues/363 yesterday and I believe that should fix the issue.
I am going to run a test using the https://hub.docker.com/r/rocker/r-devel-san/ to confirm that this is fixed before I send it off to CRAN. Best, Zhian On Tue, Feb 4, 2025 at 4:01 AM Iñaki Ucar <iu...@fedoraproject.org> wrote: > On Tue, 4 Feb 2025 at 12:56, Uwe Ligges <lig...@statistik.tu-dortmund.de> > wrote: > > > > > > > > On 04.02.2025 12:46, Iñaki Ucar wrote: > > > @Ivan: Excellent anaylsis as always. > > > > > > @Bernd: So what can **you** do about it? You are using adegenet > > > correctly as Ivan pointed out, so IMHO CRAN should have requested > > > adegenet's maintainer to fix this. But since it's your package that is > > > on the line here, I would put that example inside a dontrun{} chunk > > > for now, to avoid triggering this issue on CRAN, and I would report > > > Ivan's analysis upstream. > > > > Putting it in dontrun is the wrong advise in any case, as we like to > > track whether the underlying issue has been fixed. And both Professor > > Ripley and I already asked to liase with the adegenet maintainer to get > > this fixed. > > Great, we obviously didn't know this, but only that Bernd's package is > scheduled to be archived soon, thus my advice. > > Iñaki > > > So I wonder why the most relevant person (Zhian N. Kamvar) > > was not included here (CCIng now). > > Ivan's analysis is as always extremely helpful, particularly for the > > adegenet maintainer. > > > > Best, > > Uwe > > > > > > > > > Iñaki > > > > > > > > > On Tue, 4 Feb 2025 at 12:30, Ivan Krylov via R-package-devel > > > <r-package-devel@r-project.org> wrote: > > >> > > >> В Sun, 2 Feb 2025 22:56:47 +0000 > > >> Bernd.Gruber <bernd.gru...@canberra.edu.au> пишет: > > >> > > >>> READ of size 16 at 0x518000697ff0 thread T0 > > >>> #0 0x7f2e873ccfdf in bytesToDouble > > >>> /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/snpbin.c:225:19 > > >>> #1 0x7f2e873ceca5 in snpbin2freq > > >>> /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/snpbin.c:332:5 > > >>> #2 0x7f2e873ceca5 in snpbin_dotprod_freq > > >>> /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/snpbin.c:447:5 > > >>> #3 0x7f2e873bba42 in GLdotProd > > >>> > /tmp/RtmpNNPUz9/R.INSTALL3cef1f2b1bd39c/adegenet/src/GLfunctions.c:42:14 > > >> > > >> Ben Bolker is exactly right; the problem happens in the 'adegenet' > > >> code. Why? > > >> > > >> bytesToDouble() is asked to unpack the bytes from the 'vecbytes' array > > >> (26 bytes) into individual bits stored as doubles in the 'out' array. > > >> The latter was allocated by the snpbin_dotprod_freq() function to > > >> contain 199 elements [1]. Every byte must be unpacked into 8 bits, and > > >> 199 is less than 26*8 = 208. Where did the values come from? > > >> > > >> The C function GLsumFreq() stores them unchanged from its arguments > > >> [2], and those come from the SNPbin objects passed by R code [3] from > > >> nLoc(x) and length(x$gen[[1]]@snp[[1]]). Where do they originate? > > >> > > >> The R traceback at the point of the crash is dartR.base::gl.pcoa -> > > >> adegenet::glPca -> adegenet::glDotProd. The object 'possums.gl' of S4 > > >> class 'dartR' exported by 'dartR.base' appears valid: its .$n.loc is > > >> exactly equal to length(.$gen[[1]]@snp[[1]]) * 8, so the allocation > size > > >> matches the packed binary content. > > >> > > >> The subset possums.gl[1:50,] that is used to perform PCA, on the > other > > >> hand, is invalid: length(possums.gl[1:50,]$gen[[1]]@snp[[1]]) is 26 > > >> instead of 25, which later causes bytesToDouble() to try to write > extra > > >> 8 doubles (64 bytes) into the buffer. > > >> > > >> This happens because trying to extract all SNPs from an SNPbin object > > >> introduces an extra byte: > > >> > > >> possums.gl@gen[[1]] |> _@snp |> lengths() > > >> # [1] 25 25 > > >> > > >> possums.gl@gen[[1]][rep(TRUE, nLoc(possums.gl@gen[[1]]))] |> > > >> _@snp |> lengths() > > >> # 26 26 > > >> > > >> This can be traced to a bug in adegenet:::.subsetbin: > > >> > > >> .subsetbin(as.raw(0xff), 1:8) > > >> # [1] ff 00 # <-- should be just 'ff' > > >> > > >> xint <- as.integer(rawToBits(x)[i]) # may be not divisible by 8 > > >> # so introduce padding: the following line gives 8 bits of padding > > >> # instead of 0 when length(xint) is divisible by 8 > > >> zeroes <- 8 - (length(xint)%%8) > > >> # instead use something like: > > >> # zeroes <- (8 - (length(xint)%%8)) * (length(xint)%%8 > 0) > > >> # (could probably be golfed further) > > >> return(packBits(c(xint, rep(0L, zeroes)))) > > >> > > >> But we're getting two bugs for the price of one, because even with a > > >> 25-byte buffer, nLoc(.) == 199 would still result in an 8-byte > > >> overflow. This is solely on the bytesToDouble() C function: it ought > to > > >> know to stop after writing *reslength elements into the 'vecres' > array. > > >> > > >> I'm afraid there is no easy way to work around either of the bugs in > > >> the dartR.base code. > > >> > > >> -- > > >> Best regards, > > >> Ivan > > >> > > >> [1] > > >> > https://github.com/thibautjombart/adegenet/blob/c7287597155ab18989d892a72eff33cf8c288958/src/snpbin.c#L443-L444 > > >> > > >> [2] > > >> > https://github.com/thibautjombart/adegenet/blob/c7287597155ab18989d892a72eff33cf8c288958/src/GLfunctions.c#L124 > > >> > > >> [3] > > >> > https://github.com/thibautjombart/adegenet/blob/c7287597155ab18989d892a72eff33cf8c288958/R/glFunctions.R#L215-L216 > > >> > > >> ______________________________________________ > > >> R-package-devel@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-package-devel > > >> > > > > > > > > > > > -- > Iñaki Úcar > [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel