[Rd] Garbage collection of seemingly PROTECTed pairlist
I want to write an R function using R's C interface that takes a 2-column matrix of increasing, non-overlapping integer intervals and returns a list with those intervals plus some added intervals, such that there are no gaps. For example, it should take the matrix rbind(c(5L, 6L), c(7L, 10L), c(20L, 30L)) and return list(c(5L, 6L), c(7L, 10L), c(11L, 19L), c(20L, 30L)). Because the output is of variable length, I use a pairlist (because it is growable) and then I call Rf_PairToVectorList() at the end to make it into a regular list. I'm getting a strange garbage collection error. My PROTECTed pairlist prlst gets garbage collected away and causes a memory leak error when I try to access it. Here's my code. #include SEXP C_int_mat_nth_row_nrnc(int *int_mat_int, int nr, int nc, int n) { SEXP out = PROTECT(Rf_allocVector(INTSXP, nc)); int *out_int = INTEGER(out); if (n <= 0 | n > nr) { for (int i = 0; i != nc; ++i) { out_int[i] = NA_INTEGER; } } else { for (int i = 0; i != nr; ++i) { out_int[i] = int_mat_int[n - 1 + i * nr]; } } UNPROTECT(1); return out;} SEXP C_make_len2_int_vec(int first, int second) { SEXP out = PROTECT(Rf_allocVector(INTSXP, 2)); int *out_int = INTEGER(out); out_int[0] = first; out_int[1] = second; UNPROTECT(1); return out;} SEXP C_fullocate(SEXP int_mat) { int nr = Rf_nrows(int_mat), *int_mat_int = INTEGER(int_mat); int last, row_num; // row_num will be 1-indexed SEXP prlst0cdr = PROTECT(C_int_mat_nth_row_nrnc(int_mat_int, nr, 2, 1)); SEXP prlst = PROTECT(Rf_list1(prlst0cdr)); SEXP prlst_tail = prlst; last = INTEGER(prlst0cdr)[1]; row_num = 2; while (row_num <= nr) { Rprintf("row_num: %i\n", row_num); SEXP row = PROTECT(C_int_mat_nth_row_nrnc(int_mat_int, nr, 2, row_num)); Rf_PrintValue(prlst); // This is where the error occurs int *row_int = INTEGER(row); if (row_int[0] == last + 1) { Rprintf("here1"); SEXP next = PROTECT(Rf_list1(row)); prlst_tail = SETCDR(prlst_tail, next); last = row_int[1]; UNPROTECT(1); ++row_num; } else { Rprintf("here2"); SEXP next_car = PROTECT(C_make_len2_int_vec(last + 1, row_int[0] - 1)); SEXP next = PROTECT(Rf_list1(next_car)); prlst_tail = SETCDR(prlst_tail, next); last = row_int[0] - 1; UNPROTECT(2); } UNPROTECT(1); } SEXP out = PROTECT(Rf_PairToVectorList(prlst)); UNPROTECT(3); return out;} As you can see, I have some diagnostic print statements in there. The offending line is line 40, which I have marked with a comment of // This is where the error occurs. I have a minimal reproducible package at https://github.com/rorynolan/testpkg and I have run R CMD CHECK with valgrind using GitHub actions, the results of which are at https://github.com/rorynolan/testpkg/runs/1076595757?check_suite_focus=true. That's where I found out which line is causing the error. This function works as expected sometimes, and then sometimes this issue appears. This lends weight to the suspicion that it's a garbage collection issue. I really want to know what my mistake is. I'm not that interested in alternative implementations; I want to understand the mistake that I'm making so that I can avoid making it in future. I have asked the question on stackoverflow to little avail, but the discussion there may prove helpful. https://stackoverflow.com/questions/63759604/garbage-collection-of-seemingly-protected-pairlist. Thanks, Rory [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Including full text of open source licenses in a package
Hi all, R-exts currently requests that package authors don't include copies of standard licenses: > Whereas you should feel free to include a license file in your source > distribution, please do > not arrange to install yet another copy of the GNU COPYING or COPYING.LIB > files but > refer to the copies on https://www.R-project.org/Licenses/ and included in > the R distribution > (in directory share/licenses). Since files named LICENSE or LICENCE will be > installed, > do not use these names for standard license files. I'd like to request that this condition be removed because it makes it overly difficult to ensure that every version of your package (source, tar.gz, binary, and installed) includes the full text of the license. This is important because most open source licenses explicitly require that you include the full text of the license. For example, the GPL faq (http://www.gnu.org/licenses/gpl-faq.html#WhyMustIInclude) states: > Why does the GPL require including a copy of the GPL with every copy of the > program? > (#WhyMustIInclude) > > Including a copy of the license with the work is vital so that everyone who > gets a copy of > the program can know what their rights are. > > It might be tempting to include a URL that refers to the license, instead of > the license > itself. But you cannot be sure that the URL will still be valid, five years > or ten years from > now. Twenty years from now, URLs as we know them today may no longer exist. > > The only way to make sure that people who have copies of the program will > continue > to be able to see the license, despite all the changes that will happen in > the network, > is to include a copy of the license in the program. This analysis by an open source lawyer, https://writing.kemitchell.com/2016/09/21/MIT-License-Line-by-Line.html#notice-condition, reinforces the same message for the MIT license. Currently we've been working around this limitation by putting a markdown version of the license in LICENSE.md and then adding that to .Rbuildignore (this ensures that the source version on GitHub includes the license even if the CRAN version does not). Ideally, as well as allowing us to include full text of licenses in LICENSE or LICENSE.txt, a LICENSE.md at the top-level of the package would also be explicitly permitted. Hadley -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] CRAN metadata broken?
E.g. in https://cran.r-project.org/bin/macosx/contrib/4.0/PACKAGES there is Package: stringi Version: 1.5.3 but there is no such binary at https://cran.r-project.org/bin/macosx/contrib/4.0/ FYI, G. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] more Matrix weirdness
Abby, my answer was too concise. The thrust is that even if you define a method for "[<-" with signature x="matrix" and value ="Matrix", for example, it will never be used since "matrix" is S3. If instead x="someS4class" then the S4 method will be invoked. There may be cases when changing the class of the left-hand side make sense (such as one subclass of "Matrix" to another) but certainly not for the base R vector classes. Georgi Boshnakov -Original Message- From: Abby Spurdle Sent: 11 September 2020 03:03 To: Martin Maechler Cc: Georgi Boshnakov ; r-devel@r-project.org Subject: Re: [Rd] more Matrix weirdness > > "These operators are also implicit S4 generics, but as > > primitives, S4 methods will be dispatched only on S4 > > objects ‘x’." > Yes, exactly, very well found, Georgi! I'm sorry Martin, but I don't understand your point here. I'm assuming that you want the (S3) matrix, x, to be converted to an (S4) Matrix. However, this is not a question of method dispatch, as such. But rather a question of type conversion (integer to numeric to complex, etc). Specifically, can/should automatic type conversion, convert an S3 data type to an S4 data type, even where user-defined data types are involved? __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] more Matrix weirdness
If the current semantics is to be kept, one approach might be to insert in the internal code of "[<-" something like the equivalent of If(isS4(value)) value <- coreData(value) with the contract that classes that wish to be treated as equivalent to basic vector classes define suitable method for coreData(). So, a matrix class would return a plain matrix, a vector class - a vector, etc. This is akin to coredata() for time series in package 'zoo' and others. Georgi Boshnakov -Original Message- From: Martin Maechler Sent: 10 September 2020 13:48 To: Georgi Boshnakov Cc: r-devel@r-project.org Subject: Re: [Rd] more Matrix weirdness > Georgi Boshnakov > on Wed, 9 Sep 2020 10:48:56 + writes: > I think that this is because `[<-` dispatches on S4 > methods only if the first argument is S4. ?"[<-" says: > "These operators are also implicit S4 generics, but as > primitives, S4 methods will be dispatched only on S4 > objects ‘x’." > Georgi Boshnakov Yes, exactly, very well found, Georgi! This is something I would have wanted different for years, exactly because of several such problems with the Matrix package of which I'm the maintainer. Long time ago I had also looked if I saw how to fix this behavior inside 'methods' (i.e. the S4 infrastructure pkg) and I think in this case also, inside R's basic C code. At the time (~ 10 yrs ago) I gave up, but don't remember why. I'm happy if you create a formal bug report, possibly "wishlist" as it is documented behavior, for this infelicity... and then I will probably add the 'HELPWANTED' keyword. Martin > -Original Message- > Message: 19 Date: Tue, 8 Sep 2020 22:04:44 -0400 From: Ben > Bolker To: r-devel > Subject: [Rd] more Matrix > weirdness Message-ID: > > Content-Type: text/plain; charset="utf-8"; Format="flowed" >Am I being too optimistic in expecting this (mixing and > matching matrices and Matrices) to work? If x is a matrix > and m is a Matrix, replacing a commensurately sized > sub-matrix of x with m throws "number of items to replace > is not a multiple of replacement length" ... > x <- matrix(0,nrow=3,ncol=10, > dimnames=list(letters[1:3],LETTERS[1:10])) rr <- > c("a","b","c") cc <- c("B","C","E") m <- > Matrix(matrix(1:9,3,3)) x[rr,cc] <- m > cheers Ben Bolker > -- > Message: 20 Date: Wed, 9 Sep 2020 07:03:47 +0100 From: Rui > Barradas To: Ben Bolker > , r-devel > Subject: Re: [Rd] more Matrix weirdness Message-ID: > <7037975c-22b6-6eca-d871-743eead53...@sapo.pt> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > Hello, > R 4.0.2 on Ubuntu 20.04, sessionInfo() below. > I can reproduce this, sort of. The error I'm getting is > different: > x[rr, cc] <- m #Error in x[rr, cc] <- m : replacement has > length zero > But if I check lengths and dimensions, they are > identical(). > identical(length(x[rr, cc]), length(m)) #[1] TRUE > identical(dim(x[rr, cc]), dim(m)) #[1] TRUE > What works is > x[rr, cc] <- as.matrix(m) > I ran Ben's code on RStudio 1.3.1073, the following is run > with Rscript and the error message is the same. > rui@rui:~$ Rscript --vanilla rhelp.R R version 4.0.2 > (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04.1 LTS > Matrix products: default BLAS: > /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: > /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > locale: [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C [3] > LC_TIME=pt_PT.UTF-8 LC_COLLATE=pt_PT.UTF-8 [5] > LC_MONETARY=pt_PT.UTF-8 LC_MESSAGES=pt_PT.UTF-8 [7] > LC_PAPER=pt_PT.UTF-8 LC_NAME=C [9] LC_ADDRESS=C > LC_TELEPHONE=C [11] LC_MEASUREMENT=pt_PT.UTF-8 > LC_IDENTIFICATION=C > attached base packages: [1] stats graphics grDevices utils > datasets methods base > other attached packages: [1] Matrix_1.2-18 > loaded via a namespace (and not attached): [1] > compiler_4.0.2 grid_4.0.2 lattice_0.20-41 Error in x[rr, > cc] <- m : number of items to replace is not a multiple of > replacement length Execution halted > Hope this helps, > Rui Barradas > Às 03:04 de 09/09/20, Ben Bolker escreveu: >> Am I being too optimistic in expecting this (mixing and >> matching matrices and Matrices) to work? If x is a >> matrix and m is a Matrix, replacing a commensurately >> sized sub-matrix of x with m throws "number of items to >> replace is not a multiple of replacement length" ... >> >> x <- matrix(0,nrow=3,ncol=10, >> dimnames=list(letters[1:3],LETTERS[1:10])) rr <- >> c("a","b","c") cc <- c("B","C","E") m <- >> Matrix(matrix(1:9,3,3)) x[rr,cc] <- m >> >> cheers Ben Bolker >> >> ___