[Rd] Garbage collection of seemingly PROTECTed pairlist

2020-09-11 Thread Rory Nolan
I want to write an R function using R's C interface that takes a 2-column
matrix of increasing, non-overlapping integer intervals and returns a list
with those intervals plus some added intervals, such that there are no
gaps. For example, it should take the matrix rbind(c(5L, 6L), c(7L, 10L),
c(20L, 30L)) and return list(c(5L, 6L), c(7L, 10L), c(11L, 19L), c(20L,
30L)). Because the output is of variable length, I use a pairlist (because
it is growable) and then I call Rf_PairToVectorList() at the end to make it
into a regular list.

I'm getting a strange garbage collection error. My PROTECTed pairlist prlst
gets garbage collected away and causes a memory leak error when I try to
access it.

Here's my code.

#include 


SEXP C_int_mat_nth_row_nrnc(int *int_mat_int, int nr, int nc, int n) {
  SEXP out = PROTECT(Rf_allocVector(INTSXP, nc));
  int *out_int = INTEGER(out);
  if (n <= 0 | n > nr) {
for (int i = 0; i != nc; ++i) {
  out_int[i] = NA_INTEGER;
}
  } else {
for (int i = 0; i != nr; ++i) {
  out_int[i] = int_mat_int[n - 1 + i * nr];
}
  }
  UNPROTECT(1);
  return out;}

SEXP C_make_len2_int_vec(int first, int second) {
  SEXP out = PROTECT(Rf_allocVector(INTSXP, 2));
  int *out_int = INTEGER(out);
  out_int[0] = first;
  out_int[1] = second;
  UNPROTECT(1);
  return out;}

SEXP C_fullocate(SEXP int_mat) {
  int nr = Rf_nrows(int_mat), *int_mat_int = INTEGER(int_mat);
  int last, row_num;  // row_num will be 1-indexed
  SEXP prlst0cdr = PROTECT(C_int_mat_nth_row_nrnc(int_mat_int, nr, 2, 1));
  SEXP prlst = PROTECT(Rf_list1(prlst0cdr));
  SEXP prlst_tail = prlst;
  last = INTEGER(prlst0cdr)[1];
  row_num = 2;
  while (row_num <= nr) {
Rprintf("row_num: %i\n", row_num);
SEXP row = PROTECT(C_int_mat_nth_row_nrnc(int_mat_int, nr, 2, row_num));
Rf_PrintValue(prlst);  // This is where the error occurs
int *row_int = INTEGER(row);
if (row_int[0] == last + 1) {
  Rprintf("here1");
  SEXP next = PROTECT(Rf_list1(row));
  prlst_tail = SETCDR(prlst_tail, next);
  last = row_int[1];
  UNPROTECT(1);
  ++row_num;
} else {
  Rprintf("here2");
  SEXP next_car = PROTECT(C_make_len2_int_vec(last + 1, row_int[0] - 1));
  SEXP next = PROTECT(Rf_list1(next_car));
  prlst_tail = SETCDR(prlst_tail, next);
  last = row_int[0] - 1;
  UNPROTECT(2);
}
UNPROTECT(1);
  }
  SEXP out = PROTECT(Rf_PairToVectorList(prlst));
  UNPROTECT(3);
  return out;}

As you can see, I have some diagnostic print statements in there. The
offending line is line 40, which I have marked with a comment of // This is
where the error occurs. I have a minimal reproducible package at
https://github.com/rorynolan/testpkg and I have run R CMD CHECK with
valgrind using GitHub actions, the results of which are at
https://github.com/rorynolan/testpkg/runs/1076595757?check_suite_focus=true.
That's where I found out which line is causing the error. This function
works as expected sometimes, and then sometimes this issue appears. This
lends weight to the suspicion that it's a garbage collection issue.

I really want to know what my mistake is. I'm not that interested in
alternative implementations; I want to understand the mistake that I'm
making so that I can avoid making it in future.

I have asked the question on stackoverflow to little avail, but the
discussion there may prove helpful.
https://stackoverflow.com/questions/63759604/garbage-collection-of-seemingly-protected-pairlist.



Thanks,

Rory

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Including full text of open source licenses in a package

2020-09-11 Thread Hadley Wickham
Hi all,

R-exts currently requests that package authors don't include copies of
standard licenses:

> Whereas you should feel free to include a license file in your source 
> distribution, please do
> not arrange to install yet another copy of the GNU COPYING or COPYING.LIB 
> files but
> refer to the copies on https://www.R-project.org/Licenses/ and included in 
> the R distribution
> (in directory share/licenses). Since files named LICENSE or LICENCE will be 
> installed,
> do not use these names for standard license files.

I'd like to request that this condition be removed because it makes it
overly difficult to ensure that every version of your package (source,
tar.gz, binary, and installed) includes the full text of the license.
This is important because most open source licenses explicitly require
that you include the full text of the license. For example, the GPL
faq (http://www.gnu.org/licenses/gpl-faq.html#WhyMustIInclude) states:

> Why does the GPL require including a copy of the GPL with every copy of the 
> program?
> (#WhyMustIInclude)
>
> Including a copy of the license with the work is vital so that everyone who 
> gets a copy of
> the program can know what their rights are.
>
> It might be tempting to include a URL that refers to the license, instead of 
> the license
> itself. But you cannot be sure that the URL will still be valid, five years 
> or ten years from
> now. Twenty years from now, URLs as we know them today may no longer exist.
>
> The only way to make sure that people who have copies of the program will 
> continue
> to be able to see the license, despite all the changes that will happen in 
> the network,
> is to include a copy of the license in the program.

This analysis by an open source lawyer,
https://writing.kemitchell.com/2016/09/21/MIT-License-Line-by-Line.html#notice-condition,
reinforces the same message for the MIT license.

Currently we've been working around this limitation by putting a
markdown version of the license in LICENSE.md and then adding that to
.Rbuildignore (this ensures that the source version on GitHub includes
the license even if the CRAN version does not). Ideally, as well as
allowing us to include full text of licenses in LICENSE or
LICENSE.txt, a LICENSE.md at the top-level of the package would also
be explicitly permitted.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] CRAN metadata broken?

2020-09-11 Thread Gábor Csárdi
E.g. in https://cran.r-project.org/bin/macosx/contrib/4.0/PACKAGES there is

Package: stringi
Version: 1.5.3

but there is no such binary at
https://cran.r-project.org/bin/macosx/contrib/4.0/

FYI, G.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] more Matrix weirdness

2020-09-11 Thread Georgi Boshnakov
Abby, my answer was too concise. The thrust is that even if you define a method 
for 
"[<-" with signature x="matrix" and value ="Matrix", for example, it will never 
be used since "matrix" is S3. 
If instead x="someS4class" then the S4 method will be invoked.

There may be cases when changing the class of the left-hand side make sense 
(such as one subclass of "Matrix" to another) but certainly not for the base R 
vector classes.


Georgi Boshnakov


-Original Message-
From: Abby Spurdle  
Sent: 11 September 2020 03:03
To: Martin Maechler 
Cc: Georgi Boshnakov ; r-devel@r-project.org
Subject: Re: [Rd] more Matrix weirdness

> > "These operators are also implicit S4 generics, but as
> > primitives, S4 methods will be dispatched only on S4
> > objects ‘x’."

> Yes, exactly,  very well found, Georgi!

I'm sorry Martin, but I don't understand your point here.

I'm assuming that you want the (S3) matrix, x, to be converted to an
(S4) Matrix.

However, this is not a question of method dispatch, as such.
But rather a question of type conversion (integer to numeric to complex, etc).

Specifically, can/should automatic type conversion, convert an S3 data type to 
an S4 data type, even where user-defined data types are involved?
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] more Matrix weirdness

2020-09-11 Thread Georgi Boshnakov
If the current semantics is to be kept, one approach might be to insert in the 
internal code of "[<-" something like the equivalent of 

If(isS4(value))
value <- coreData(value)

with the contract that classes that wish to be treated as equivalent to basic 
vector classes define suitable method for coreData(). So, a matrix class would 
return a plain matrix, a vector class - a vector, etc. This is akin to 
coredata() for time series in package 'zoo' and others. 


Georgi Boshnakov

-Original Message-
From: Martin Maechler  
Sent: 10 September 2020 13:48
To: Georgi Boshnakov 
Cc: r-devel@r-project.org
Subject: Re: [Rd] more Matrix weirdness

> Georgi Boshnakov 
> on Wed, 9 Sep 2020 10:48:56 + writes:

> I think that this is because `[<-` dispatches on S4
> methods only if the first argument is S4.  ?"[<-" says:

> "These operators are also implicit S4 generics, but as
> primitives, S4 methods will be dispatched only on S4
> objects ‘x’."

> Georgi Boshnakov

Yes, exactly,  very well found, Georgi!

This is something I would have wanted different for years, exactly because of 
several such problems with the Matrix package of which I'm the maintainer.

Long time ago I had also looked if I saw how to fix this behavior inside 
'methods' (i.e. the S4 infrastructure pkg) and I think in this case also, 
inside R's basic C code.

At the time (~ 10 yrs ago) I gave up, but don't remember why.

I'm happy if you create a formal bug report, possibly "wishlist"
as it is documented behavior, for this infelicity...
and then I will probably add the  'HELPWANTED'  keyword.

Martin


> -Original Message-


> Message: 19 Date: Tue, 8 Sep 2020 22:04:44 -0400 From: Ben
> Bolker  To: r-devel
>  Subject: [Rd] more Matrix
> weirdness Message-ID:
> 
> Content-Type: text/plain; charset="utf-8"; Format="flowed"

>Am I being too optimistic in expecting this (mixing and
> matching matrices and Matrices) to work?  If x is a matrix
> and m is a Matrix, replacing a commensurately sized
> sub-matrix of x with m throws "number of items to replace
> is not a multiple of replacement length" ...

> x <- matrix(0,nrow=3,ncol=10,
> dimnames=list(letters[1:3],LETTERS[1:10])) rr <-
> c("a","b","c") cc <- c("B","C","E") m <-
> Matrix(matrix(1:9,3,3)) x[rr,cc] <- m

> cheers Ben Bolker




> --

> Message: 20 Date: Wed, 9 Sep 2020 07:03:47 +0100 From: Rui
> Barradas  To: Ben Bolker
> , r-devel 
> Subject: Re: [Rd] more Matrix weirdness Message-ID:
> <7037975c-22b6-6eca-d871-743eead53...@sapo.pt>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"

> Hello,

> R 4.0.2 on Ubuntu 20.04, sessionInfo() below.

> I can reproduce this, sort of. The error I'm getting is
> different:


> x[rr, cc] <- m #Error in x[rr, cc] <- m : replacement has
> length zero

> But if I check lengths and dimensions, they are
> identical().

> identical(length(x[rr, cc]), length(m)) #[1] TRUE
> identical(dim(x[rr, cc]), dim(m)) #[1] TRUE


> What works is


> x[rr, cc] <- as.matrix(m)

> I ran Ben's code on RStudio 1.3.1073, the following is run
> with Rscript and the error message is the same.


> rui@rui:~$ Rscript --vanilla rhelp.R R version 4.0.2
> (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.1 LTS

> Matrix products: default BLAS:
> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK:
> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

> locale: [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C [3]
> LC_TIME=pt_PT.UTF-8 LC_COLLATE=pt_PT.UTF-8 [5]
> LC_MONETARY=pt_PT.UTF-8 LC_MESSAGES=pt_PT.UTF-8 [7]
> LC_PAPER=pt_PT.UTF-8 LC_NAME=C [9] LC_ADDRESS=C
> LC_TELEPHONE=C [11] LC_MEASUREMENT=pt_PT.UTF-8
> LC_IDENTIFICATION=C

> attached base packages: [1] stats graphics grDevices utils
> datasets methods base

> other attached packages: [1] Matrix_1.2-18

> loaded via a namespace (and not attached): [1]
> compiler_4.0.2 grid_4.0.2 lattice_0.20-41 Error in x[rr,
> cc] <- m : number of items to replace is not a multiple of
> replacement length Execution halted


> Hope this helps,

> Rui Barradas


> Às 03:04 de 09/09/20, Ben Bolker escreveu:
>>   Am I being too optimistic in expecting this (mixing and
>> matching matrices and Matrices) to work?  If x is a
>> matrix and m is a Matrix, replacing a commensurately
>> sized sub-matrix of x with m throws "number of items to
>> replace is not a multiple of replacement length" ...
>> 
>> x <- matrix(0,nrow=3,ncol=10,
>> dimnames=list(letters[1:3],LETTERS[1:10])) rr <-
>> c("a","b","c") cc <- c("B","C","E") m <-
>> Matrix(matrix(1:9,3,3)) x[rr,cc] <- m
>> 
>>    cheers     Ben Bolker
>> 
>> ___