Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-09 Thread Aaron Lun
Martin Morgan wrote:
> On 04/08/2017 12:29 PM, Aaron Lun wrote:
>> Martin Morgan wrote:
>>> On 04/08/2017 08:16 AM, Aaron Lun wrote:
 On 07/04/17 20:46, luke-tier...@uiowa.edu wrote:
> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>
>> On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:
>>>  On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>
  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>   wrote:
>>  On 04/06/2017 05:33 AM, Aaron Lun wrote:
>  The tool is not perfect, so assess each report
>>> carefully.
>  I get a lot of warnings because the tool seems to consider that
  extracting an attribute (with getAttrib(x, ...)) or extracting
 the
  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
  returns an SEXP that needs protection. I always assumed that it
  didn't because my understanding is that the returned SEXP is
 pointing
  to a part of a pre-existing object ('x') and not to a newly
 created
  one. So I decided I could treat it like the SEXP returned by
  VECTOR_ELT(), which, AFAIK, doesn't need protection.
>  So I suspect these warnings are false positives but I'm not 100%
>>> sure.
>>>
>>>  If you are not 100% sure then you should protect :-)
>>>
>>>  There are some cases, in particular related to compact row
>>> names on
>>>  data frames, where getAttrib will allocate.
>>
>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a
>> no-op
>> anymore? Should I worry that VECTOR_ELT() will also expand some sort
>> of compact list element? Why not keep these things low-level
>> getters/setters that return whatever the real thing is and use
>> higher-level accessors for returning the expanded version of the
>> thing?
>
> Seriously: it'sbeen that way since r37807 in 2006.
>
> If you want to read about some related future directions you can
> look at
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.
>
> luke

 I was curious about this so I checked out what R-exts had to say
 involving set/getAttrib. Funnily enough, the example it gives in
 Section
 5.9.4 seems to be incorrect in its UNPROTECTing.

 #include 
 #include 

 SEXP out(SEXP x, SEXP y)
 {
  int nx = length(x), ny = length(y);
  SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
  double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);

  for(int i = 0; i < nx; i++) {
  double tmp = rx[i];
  for(int j = 0; j < ny; j++)
  rans[i + nx*j] = tmp * ry[j];
  }

  SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
  SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
  SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
  setAttrib(ans, R_DimNamesSymbol, dimnames);


  UNPROTECT(3);
  return ans;
 }

 There are two PROTECT calls but an UNPROTECT(3), which triggers a
 stack
 imbalance warning upon trying to run .Call("out", ...) in R.
>>>
>>> Yes, that should be UNPROTECT(2). svn blame says the error was
>>> introduced when allocMatrix() was introduced; prior to that the code
>>> had allocVector(), then set dim and dimnames.
>>>
>>> As for whether to PROTECT or not, my analysis would be...
>>>
>>> SET_VECTOR_ELT does not (currently) allocate (except on error) so
>>> there is no opportunity for the garbage collector to run, hence no
>>> need to PROTECT.
>>>
>>> Further, getAttrib() (currently) allocates only if (1) the attribute
>>> is R_RowNamesSymbol and the row names are stored in compact format
>>> c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
>>> or language SEXP. None of these conditions apply, so the return value
>>> of getAttrib() is PROTECTed anyway.
>>>
>>> Luke's analysis would be more straight-forward: if in doubt, PROTECT.
>>>
>>> I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
>>> maybe also note that my advice, in addition to being an analysis of
>>> some surprisingly complicated code by a practitioner of dubious
>>> credibility, involves the current state of affairs, and you never know
>>> (and apparently ALTREP makes it less certain) what the future will
>>> hold. So they'd probably say PROTECT.
>>>
>>> One might be tempted to write
>>>
>>>  SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
>>>
>>> but I'm not sure whether C guarantees that function arguments are
>>> fully evaluated, maybe it's legal for a compiler to evaluate
>>> getAttrib(), then do some more work with other arguments, then
>>> evaluate PROTECT(), so long as the overall logic is not disrupted. So
>>> the 'if in doubt' 

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-09 Thread Martin Morgan

On 04/08/2017 12:29 PM, Aaron Lun wrote:

Martin Morgan wrote:

On 04/08/2017 08:16 AM, Aaron Lun wrote:

On 07/04/17 20:46, luke-tier...@uiowa.edu wrote:

On Fri, 7 Apr 2017, Hervé Pagès wrote:


On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:

 On Fri, 7 Apr 2017, Hervé Pagès wrote:


 On 04/06/2017 03:29 AM, Michael Lawrence wrote:

 On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
  wrote:

 On 04/06/2017 05:33 AM, Aaron Lun wrote:

 The tool is not perfect, so assess each report

carefully.

 I get a lot of warnings because the tool seems to consider that

 extracting an attribute (with getAttrib(x, ...)) or extracting the
 slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
 returns an SEXP that needs protection. I always assumed that it
 didn't because my understanding is that the returned SEXP is
pointing
 to a part of a pre-existing object ('x') and not to a newly created
 one. So I decided I could treat it like the SEXP returned by
 VECTOR_ELT(), which, AFAIK, doesn't need protection.

 So I suspect these warnings are false positives but I'm not 100%

sure.

 If you are not 100% sure then you should protect :-)

 There are some cases, in particular related to compact row names on
 data frames, where getAttrib will allocate.


Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
anymore? Should I worry that VECTOR_ELT() will also expand some sort
of compact list element? Why not keep these things low-level
getters/setters that return whatever the real thing is and use
higher-level accessors for returning the expanded version of the
thing?


Seriously: it'sbeen that way since r37807 in 2006.

If you want to read about some related future directions you can
look at
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.

luke


I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.

#include 
#include 

SEXP out(SEXP x, SEXP y)
{
 int nx = length(x), ny = length(y);
 SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
 double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);

 for(int i = 0; i < nx; i++) {
 double tmp = rx[i];
 for(int j = 0; j < ny; j++)
 rans[i + nx*j] = tmp * ry[j];
 }

 SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
 SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
 SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
 setAttrib(ans, R_DimNamesSymbol, dimnames);


 UNPROTECT(3);
 return ans;
}

There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.


Yes, that should be UNPROTECT(2). svn blame says the error was
introduced when allocMatrix() was introduced; prior to that the code
had allocVector(), then set dim and dimnames.

As for whether to PROTECT or not, my analysis would be...

SET_VECTOR_ELT does not (currently) allocate (except on error) so
there is no opportunity for the garbage collector to run, hence no
need to PROTECT.

Further, getAttrib() (currently) allocates only if (1) the attribute
is R_RowNamesSymbol and the row names are stored in compact format
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
or language SEXP. None of these conditions apply, so the return value
of getAttrib() is PROTECTed anyway.

Luke's analysis would be more straight-forward: if in doubt, PROTECT.

I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
maybe also note that my advice, in addition to being an analysis of
some surprisingly complicated code by a practitioner of dubious
credibility, involves the current state of affairs, and you never know
(and apparently ALTREP makes it less certain) what the future will
hold. So they'd probably say PROTECT.

One might be tempted to write

 SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));

but I'm not sure whether C guarantees that function arguments are
fully evaluated, maybe it's legal for a compiler to evaluate
getAttrib(), then do some more work with other arguments, then
evaluate PROTECT(), so long as the overall logic is not disrupted. So
the 'if in doubt' argument would make me write

SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);

I think , in C is called a 'sequence point'. Google takes me to

  https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx

where it seems like the left operand of ',' are fully evaluated before
proceeding, and furthermore that arguments to functions are evaluated
before entering the function, implying that it is safe to use the
one-liner

 SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));

At any rate I changed the example in R-exts to UNPROTECT(2), leaving
the nuances for further discussion.

Martin


I wonder if the following 

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-08 Thread Hervé Pagès



On 04/08/2017 06:50 AM, Martin Morgan wrote:

On 04/08/2017 08:16 AM, Aaron Lun wrote:

On 07/04/17 20:46, luke-tier...@uiowa.edu wrote:

On Fri, 7 Apr 2017, Hervé Pagès wrote:


On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:

 On Fri, 7 Apr 2017, Hervé Pagès wrote:


 On 04/06/2017 03:29 AM, Michael Lawrence wrote:

 On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
  wrote:

 On 04/06/2017 05:33 AM, Aaron Lun wrote:

 The tool is not perfect, so assess each report

carefully.

 I get a lot of warnings because the tool seems to consider that

 extracting an attribute (with getAttrib(x, ...)) or extracting the
 slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
 returns an SEXP that needs protection. I always assumed that it
 didn't because my understanding is that the returned SEXP is
pointing
 to a part of a pre-existing object ('x') and not to a newly created
 one. So I decided I could treat it like the SEXP returned by
 VECTOR_ELT(), which, AFAIK, doesn't need protection.

 So I suspect these warnings are false positives but I'm not 100%

sure.

 If you are not 100% sure then you should protect :-)

 There are some cases, in particular related to compact row names on
 data frames, where getAttrib will allocate.


Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
anymore? Should I worry that VECTOR_ELT() will also expand some sort
of compact list element? Why not keep these things low-level
getters/setters that return whatever the real thing is and use
higher-level accessors for returning the expanded version of the thing?


Seriously: it'sbeen that way since r37807 in 2006.

If you want to read about some related future directions you can look at
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.r-2Dproject.org_R_branches_ALTREP_ALTREP.html=DwIF-g=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=ECmvCrpSP99jd6o3J4LkX4nL1PKJiNM1Ky6_-c7ob5k=S-eq87dmcXe7_GR61c5ZPGzqT9V2booIH5P7G_Jch18=
.

luke


I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.

#include 
#include 

SEXP out(SEXP x, SEXP y)
{
 int nx = length(x), ny = length(y);
 SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
 double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);

 for(int i = 0; i < nx; i++) {
 double tmp = rx[i];
 for(int j = 0; j < ny; j++)
 rans[i + nx*j] = tmp * ry[j];
 }

 SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
 SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
 SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
 setAttrib(ans, R_DimNamesSymbol, dimnames);


 UNPROTECT(3);
 return ans;
}

There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.


Yes, that should be UNPROTECT(2). svn blame says the error was
introduced when allocMatrix() was introduced; prior to that the code had
allocVector(), then set dim and dimnames.

As for whether to PROTECT or not, my analysis would be...

SET_VECTOR_ELT does not (currently) allocate (except on error) so there
is no opportunity for the garbage collector to run, hence no need to
PROTECT.

Further, getAttrib() (currently) allocates only if (1) the attribute is
R_RowNamesSymbol and the row names are stored in compact format
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist or
language SEXP. None of these conditions apply, so the return value of
getAttrib() is PROTECTed anyway.

Luke's analysis would be more straight-forward: if in doubt, PROTECT.

I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
maybe also note that my advice, in addition to being an analysis of some
surprisingly complicated code by a practitioner of dubious credibility,
involves the current state of affairs, and you never know (and
apparently ALTREP makes it less certain) what the future will hold. So
they'd probably say PROTECT.

One might be tempted to write

 SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));

but I'm not sure whether C guarantees that function arguments are fully
evaluated, maybe it's legal for a compiler to evaluate getAttrib(), then
do some more work with other arguments, then evaluate PROTECT(), so long
as the overall logic is not disrupted. So the 'if in doubt' argument
would make me write

SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);

I think , in C is called a 'sequence point'. Google takes me to


https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_azk8zbxd.aspx=DwIF-g=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=ECmvCrpSP99jd6o3J4LkX4nL1PKJiNM1Ky6_-c7ob5k=ekY5D7Za0NSpYmZxnR3ONu7u8f_qyDme47VeHsBWp6w=

where it seems like the 

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-08 Thread Aaron Lun
Martin Morgan wrote:
> On 04/08/2017 08:16 AM, Aaron Lun wrote:
>> On 07/04/17 20:46, luke-tier...@uiowa.edu wrote:
>>> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>
 On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:
>  On Fri, 7 Apr 2017, Hervé Pagès wrote:
>
>>  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>>  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>>   wrote:
  On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>>  The tool is not perfect, so assess each report
> carefully.
>>>  I get a lot of warnings because the tool seems to consider that
>>  extracting an attribute (with getAttrib(x, ...)) or extracting the
>>  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>>  returns an SEXP that needs protection. I always assumed that it
>>  didn't because my understanding is that the returned SEXP is
>> pointing
>>  to a part of a pre-existing object ('x') and not to a newly created
>>  one. So I decided I could treat it like the SEXP returned by
>>  VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>>  So I suspect these warnings are false positives but I'm not 100%
> sure.
>
>  If you are not 100% sure then you should protect :-)
>
>  There are some cases, in particular related to compact row names on
>  data frames, where getAttrib will allocate.

 Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
 anymore? Should I worry that VECTOR_ELT() will also expand some sort
 of compact list element? Why not keep these things low-level
 getters/setters that return whatever the real thing is and use
 higher-level accessors for returning the expanded version of the
 thing?
>>>
>>> Seriously: it'sbeen that way since r37807 in 2006.
>>>
>>> If you want to read about some related future directions you can
>>> look at
>>> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.
>>>
>>> luke
>>
>> I was curious about this so I checked out what R-exts had to say
>> involving set/getAttrib. Funnily enough, the example it gives in Section
>> 5.9.4 seems to be incorrect in its UNPROTECTing.
>>
>> #include 
>> #include 
>>
>> SEXP out(SEXP x, SEXP y)
>> {
>>  int nx = length(x), ny = length(y);
>>  SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
>>  double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
>>
>>  for(int i = 0; i < nx; i++) {
>>  double tmp = rx[i];
>>  for(int j = 0; j < ny; j++)
>>  rans[i + nx*j] = tmp * ry[j];
>>  }
>>
>>  SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
>>  SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
>>  SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
>>  setAttrib(ans, R_DimNamesSymbol, dimnames);
>>
>>
>>  UNPROTECT(3);
>>  return ans;
>> }
>>
>> There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
>> imbalance warning upon trying to run .Call("out", ...) in R.
>
> Yes, that should be UNPROTECT(2). svn blame says the error was
> introduced when allocMatrix() was introduced; prior to that the code
> had allocVector(), then set dim and dimnames.
>
> As for whether to PROTECT or not, my analysis would be...
>
> SET_VECTOR_ELT does not (currently) allocate (except on error) so
> there is no opportunity for the garbage collector to run, hence no
> need to PROTECT.
>
> Further, getAttrib() (currently) allocates only if (1) the attribute
> is R_RowNamesSymbol and the row names are stored in compact format
> c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
> or language SEXP. None of these conditions apply, so the return value
> of getAttrib() is PROTECTed anyway.
>
> Luke's analysis would be more straight-forward: if in doubt, PROTECT.
>
> I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
> maybe also note that my advice, in addition to being an analysis of
> some surprisingly complicated code by a practitioner of dubious
> credibility, involves the current state of affairs, and you never know
> (and apparently ALTREP makes it less certain) what the future will
> hold. So they'd probably say PROTECT.
>
> One might be tempted to write
>
>  SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
>
> but I'm not sure whether C guarantees that function arguments are
> fully evaluated, maybe it's legal for a compiler to evaluate
> getAttrib(), then do some more work with other arguments, then
> evaluate PROTECT(), so long as the overall logic is not disrupted. So
> the 'if in doubt' argument would make me write
>
> SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
> SET_VECTOR_ELT(dimnames, 0, nms);
>
> I think , in C is called a 'sequence point'. Google takes me to
>
>   https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx
>
> where it seems like the left operand of ',' are fully evaluated before
> 

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-08 Thread Martin Morgan

On 04/08/2017 08:16 AM, Aaron Lun wrote:

On 07/04/17 20:46, luke-tier...@uiowa.edu wrote:

On Fri, 7 Apr 2017, Hervé Pagès wrote:


On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:

 On Fri, 7 Apr 2017, Hervé Pagès wrote:


 On 04/06/2017 03:29 AM, Michael Lawrence wrote:

 On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
  wrote:

 On 04/06/2017 05:33 AM, Aaron Lun wrote:

 The tool is not perfect, so assess each report

carefully.

 I get a lot of warnings because the tool seems to consider that

 extracting an attribute (with getAttrib(x, ...)) or extracting the
 slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
 returns an SEXP that needs protection. I always assumed that it
 didn't because my understanding is that the returned SEXP is pointing
 to a part of a pre-existing object ('x') and not to a newly created
 one. So I decided I could treat it like the SEXP returned by
 VECTOR_ELT(), which, AFAIK, doesn't need protection.

 So I suspect these warnings are false positives but I'm not 100%

sure.

 If you are not 100% sure then you should protect :-)

 There are some cases, in particular related to compact row names on
 data frames, where getAttrib will allocate.


Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
anymore? Should I worry that VECTOR_ELT() will also expand some sort
of compact list element? Why not keep these things low-level
getters/setters that return whatever the real thing is and use
higher-level accessors for returning the expanded version of the thing?


Seriously: it'sbeen that way since r37807 in 2006.

If you want to read about some related future directions you can look at
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.

luke


I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.

#include 
#include 

SEXP out(SEXP x, SEXP y)
{
 int nx = length(x), ny = length(y);
 SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
 double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);

 for(int i = 0; i < nx; i++) {
 double tmp = rx[i];
 for(int j = 0; j < ny; j++)
 rans[i + nx*j] = tmp * ry[j];
 }

 SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
 SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
 SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
 setAttrib(ans, R_DimNamesSymbol, dimnames);


 UNPROTECT(3);
 return ans;
}

There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.


Yes, that should be UNPROTECT(2). svn blame says the error was 
introduced when allocMatrix() was introduced; prior to that the code had 
allocVector(), then set dim and dimnames.


As for whether to PROTECT or not, my analysis would be...

SET_VECTOR_ELT does not (currently) allocate (except on error) so there 
is no opportunity for the garbage collector to run, hence no need to 
PROTECT.


Further, getAttrib() (currently) allocates only if (1) the attribute is 
R_RowNamesSymbol and the row names are stored in compact format 
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist or 
language SEXP. None of these conditions apply, so the return value of 
getAttrib() is PROTECTed anyway.


Luke's analysis would be more straight-forward: if in doubt, PROTECT.

I think Herve, Gabe, and perhaps Michael would take Luke's advice, and 
maybe also note that my advice, in addition to being an analysis of some 
surprisingly complicated code by a practitioner of dubious credibility, 
involves the current state of affairs, and you never know (and 
apparently ALTREP makes it less certain) what the future will hold. So 
they'd probably say PROTECT.


One might be tempted to write

 SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));

but I'm not sure whether C guarantees that function arguments are fully 
evaluated, maybe it's legal for a compiler to evaluate getAttrib(), then 
do some more work with other arguments, then evaluate PROTECT(), so long 
as the overall logic is not disrupted. So the 'if in doubt' argument 
would make me write


SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);

I think , in C is called a 'sequence point'. Google takes me to

  https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx

where it seems like the left operand of ',' are fully evaluated before 
proceeding, and furthermore that arguments to functions are evaluated 
before entering the function, implying that it is safe to use the one-liner


 SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));

At any rate I changed the example in R-exts to UNPROTECT(2), leaving the 
nuances for further discussion.


Martin



Anyway, getting back to the topic of this thread; if we were 

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-08 Thread Aaron Lun
On 07/04/17 20:46, luke-tier...@uiowa.edu wrote:
> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>
>> On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:
>>>  On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>
>>> >  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>> > >  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>> > >   wrote:
>>> > > >  On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>> > > > > > > > > > >  The tool is not perfect, so assess each report
>>> carefully.
>>> > >  I get a lot of warnings because the tool seems to consider that
>>> >  extracting an attribute (with getAttrib(x, ...)) or extracting the
>>> >  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>>> >  returns an SEXP that needs protection. I always assumed that it
>>> >  didn't because my understanding is that the returned SEXP is pointing
>>> >  to a part of a pre-existing object ('x') and not to a newly created
>>> >  one. So I decided I could treat it like the SEXP returned by
>>> >  VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>> > >  So I suspect these warnings are false positives but I'm not 100%
>>> sure.
>>>
>>>  If you are not 100% sure then you should protect :-)
>>>
>>>  There are some cases, in particular related to compact row names on
>>>  data frames, where getAttrib will allocate.
>>
>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
>> anymore? Should I worry that VECTOR_ELT() will also expand some sort
>> of compact list element? Why not keep these things low-level
>> getters/setters that return whatever the real thing is and use
>> higher-level accessors for returning the expanded version of the thing?
>
> Seriously: it'sbeen that way since r37807 in 2006.
>
> If you want to read about some related future directions you can look at
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.
>
> luke

I was curious about this so I checked out what R-exts had to say 
involving set/getAttrib. Funnily enough, the example it gives in Section 
5.9.4 seems to be incorrect in its UNPROTECTing.

#include 
#include 

SEXP out(SEXP x, SEXP y)
{
 int nx = length(x), ny = length(y);
 SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
 double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);

 for(int i = 0; i < nx; i++) {
 double tmp = rx[i];
 for(int j = 0; j < ny; j++)
 rans[i + nx*j] = tmp * ry[j];
 }

 SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
 SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
 SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
 setAttrib(ans, R_DimNamesSymbol, dimnames);


 UNPROTECT(3);
 return ans;
}

There are two PROTECT calls but an UNPROTECT(3), which triggers a stack 
imbalance warning upon trying to run .Call("out", ...) in R.

Anyway, getting back to the topic of this thread; if we were to pretend 
that getAttrib() allocates in the above example, would that mean that 
both getAttrib() calls now need to be PROTECTed by the developer? Or is 
this handled somewhere internally?

>>
>> Thanks,
>> H.
>>
>>>
>>>  Best,
>>>
>>>  luke
>>>
>>> > > > > > > > > > > > > >  I also get a warning on almost every C++
>>> function I've written,
>>> > > > >  because
>>> > > > >  I use the following code to handle exceptions:
>>> > > > > > > > >   SEXP output=PROTECT(allocVector(...));
>>> > > > >   try {
>>> > > > >   // do something that might raise an exception
>>> > > > >   } catch (std::exception& e) {
>>> > > > >   UNPROTECT(1);
>>> > > > >   throw; // break out of this part of the function
>>> > > > >   }
>>> > > > >   UNPROTECT(1);
>>> > > > >   return output;
>>> > > > > > > > >  Presumably the check doesn't account for transfer of
>>> control to > > > >  the
>>> > > > >  catch block. I find that R itself is pretty good at
>>> complaining > > > >  about
>>> > > > >  stack imbalances during execution of tests, examples, etc.
>>> > > > > > > > > >  'My' packages
>>> > > > > >  (Rsamtools, DirichletMultinomial) had several false
>>> positives > > > > >  (all
>>> > > > > >  associated with use of an attribute of a protected SEXP),
>>> one > > > > >  subtle
>>> > > > > >  problem (a symbol from a PROTECT'ed package name space;
>>> the > > > > >  symbol
>>> > > > > >  could
>>> > > > > >  in theory be an active binding and the value obtained not
>>> > > > > >  PROTECTed by
>>> > > > > >  the name space), and a genuine bug
>>> > > > > > > > > > >  tag = NEW_CHARACTER(n);
>>> > > > > >  for (int j = 0; j < n; ++j)
>>> > > > > >  SET_STRING_ELT(tag, j, NA_STRING);
>>> > > > > >  if ('A' == aux[0]) {
>>> > > > > >  buf_A = R_alloc(2, sizeof(char));  #
>>> <<- bug
>>> > > > > >  buf_A[1] = '\0';
>>> > > > > >  }
>>> > > > > >  ...
>>> > > > > >  

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-07 Thread Gabe Becker
On Fri, Apr 7, 2017 at 12:46 PM,  wrote:

> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>
> On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:
>>
>>>  On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>
>>> >  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>> > >  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>> > >   wrote:
>>> > > >  On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>> > > > > > > > > > >  The tool is not perfect, so assess each report
>>> carefully.
>>> > >  I get a lot of warnings because the tool seems to consider that
>>> >  extracting an attribute (with getAttrib(x, ...)) or extracting the
>>> >  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>>> >  returns an SEXP that needs protection. I always assumed that it
>>> >  didn't because my understanding is that the returned SEXP is pointing
>>> >  to a part of a pre-existing object ('x') and not to a newly created
>>> >  one. So I decided I could treat it like the SEXP returned by
>>> >  VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>> > >  So I suspect these warnings are false positives but I'm not 100%
>>> sure.
>>>
>>>  If you are not 100% sure then you should protect :-)
>>>
>>>  There are some cases, in particular related to compact row names on
>>>  data frames, where getAttrib will allocate.
>>>
>>
>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
>> anymore? Should I worry that VECTOR_ELT() will also expand some sort
>> of compact list element? Why not keep these things low-level
>> getters/setters that return whatever the real thing is and use
>> higher-level accessors for returning the expanded version of the thing?
>>
>
> Seriously: it's been that way since r37807 in 2006.
>
> If you want to read about some related future directions you can look at
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.


Indeed. I was wondering whether to bring this up here.

In a (hopefully near future) version of R-devel, doing, e.g., INTEGER(x)
could allocate.  There is a way to ask it to give you NULL instead of
allocating if it would need to, but the point being it's probably going to
get much harder to safely be clever about avoiding PROTECT'ing. (Luke put
in temporary suspension of GC in some places, but I don't recall the exact
details of where that was used).

As a side note to the above, you'll need to do INTEGER(x) less often than
you did before. There will be a new INTEGER_ELT and INTEGER_GET_REGION
macros which (I think) will be guaranteed to not cause SEXP allocation.

In terms of why, at least in the ALTREP case, it's so that these things can
be passed directly to the R internals and be treated like whatever
lowl-level type of thing they are (e.g. numeric vector, string vector,
list, etc). This seamless backwards compatiblity requires that code which
doesn't use the INTEGER_ELT and INTEGER_GET_REGION (or analogues) macros
needs to have INTEGER(X) work and give it the pointer it expects, which
won't necessarily exist before the first time it is required.

Best,
~G


>
> luke
>
>
>
>
>> Thanks,
>> H.
>>
>>
>>>  Best,
>>>
>>>  luke
>>>
>>> > > > > > > > > > > > > >  I also get a warning on almost every C++
>>> function I've written,
>>> > > > >  because
>>> > > > >  I use the following code to handle exceptions:
>>> > > > > > > > >   SEXP output=PROTECT(allocVector(...));
>>> > > > >   try {
>>> > > > >   // do something that might raise an exception
>>> > > > >   } catch (std::exception& e) {
>>> > > > >   UNPROTECT(1);
>>> > > > >   throw; // break out of this part of the function
>>> > > > >   }
>>> > > > >   UNPROTECT(1);
>>> > > > >   return output;
>>> > > > > > > > >  Presumably the check doesn't account for transfer of
>>> control to > > > >  the
>>> > > > >  catch block. I find that R itself is pretty good at complaining
>>> > > > >  about
>>> > > > >  stack imbalances during execution of tests, examples, etc.
>>> > > > > > > > > >  'My' packages
>>> > > > > >  (Rsamtools, DirichletMultinomial) had several false positives
>>> > > > > >  (all
>>> > > > > >  associated with use of an attribute of a protected SEXP), one
>>> > > > > >  subtle
>>> > > > > >  problem (a symbol from a PROTECT'ed package name space; the >
>>> > > > >  symbol
>>> > > > > >  could
>>> > > > > >  in theory be an active binding and the value obtained not
>>> > > > > >  PROTECTed by
>>> > > > > >  the name space), and a genuine bug
>>> > > > > > > > > > >  tag = NEW_CHARACTER(n);
>>> > > > > >  for (int j = 0; j < n; ++j)
>>> > > > > >  SET_STRING_ELT(tag, j, NA_STRING);
>>> > > > > >  if ('A' == aux[0]) {
>>> > > > > >  buf_A = R_alloc(2, sizeof(char));  # <<-
>>> bug
>>> > > > > >  buf_A[1] = '\0';
>>> > > > > >  }
>>> > > > > >  ...
>>> > > > > >  

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-07 Thread luke-tierney

On Fri, 7 Apr 2017, Hervé Pagès wrote:


On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:

 On Fri, 7 Apr 2017, Hervé Pagès wrote:

>  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
> >  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
> >   wrote:
> > >  On 04/06/2017 05:33 AM, Aaron Lun wrote:
> > > > > 
> > > > >  The tool is not perfect, so assess each report carefully.
> 
>  I get a lot of warnings because the tool seems to consider that

>  extracting an attribute (with getAttrib(x, ...)) or extracting the
>  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>  returns an SEXP that needs protection. I always assumed that it
>  didn't because my understanding is that the returned SEXP is pointing
>  to a part of a pre-existing object ('x') and not to a newly created
>  one. So I decided I could treat it like the SEXP returned by
>  VECTOR_ELT(), which, AFAIK, doesn't need protection.
> 
>  So I suspect these warnings are false positives but I'm not 100% sure.


 If you are not 100% sure then you should protect :-)

 There are some cases, in particular related to compact row names on
 data frames, where getAttrib will allocate.


Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
anymore? Should I worry that VECTOR_ELT() will also expand some sort
of compact list element? Why not keep these things low-level
getters/setters that return whatever the real thing is and use
higher-level accessors for returning the expanded version of the thing?


Seriously: it's been that way since r37807 in 2006.

If you want to read about some related future directions you can look at
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.

luke




Thanks,
H.



 Best,

 luke

> 
> > > > 
> > > > 
> > > >  I also get a warning on almost every C++ function I've written,

> > > >  because
> > > >  I use the following code to handle exceptions:
> > > > 
> > > >   SEXP output=PROTECT(allocVector(...));

> > > >   try {
> > > >   // do something that might raise an exception
> > > >   } catch (std::exception& e) {
> > > >   UNPROTECT(1);
> > > >   throw; // break out of this part of the function
> > > >   }
> > > >   UNPROTECT(1);
> > > >   return output;
> > > > 
> > > >  Presumably the check doesn't account for transfer of control to 
> > > >  the
> > > >  catch block. I find that R itself is pretty good at complaining 
> > > >  about

> > > >  stack imbalances during execution of tests, examples, etc.
> > > > 
> > > > >  'My' packages
> > > > >  (Rsamtools, DirichletMultinomial) had several false positives 
> > > > >  (all
> > > > >  associated with use of an attribute of a protected SEXP), one 
> > > > >  subtle
> > > > >  problem (a symbol from a PROTECT'ed package name space; the 
> > > > >  symbol

> > > > >  could
> > > > >  in theory be an active binding and the value obtained not
> > > > >  PROTECTed by
> > > > >  the name space), and a genuine bug
> > > > > 
> > > > >  tag = NEW_CHARACTER(n);

> > > > >  for (int j = 0; j < n; ++j)
> > > > >  SET_STRING_ELT(tag, j, NA_STRING);
> > > > >  if ('A' == aux[0]) {
> > > > >  buf_A = R_alloc(2, sizeof(char));  # <<- bug
> > > > >  buf_A[1] = '\0';
> > > > >  }
> > > > >  ...
> > > > >  SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too
> > > > >  late!
> > > > 
> > > > 
> > > >  I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - 
> > > >  the

> > > >  R_alloc call looks okay to me...
> > > 
> > > 
> > >  yes, tag needs protection.
> > > 
> > >  I attributed the bug to R_alloc because I failed to reason that 
> > >  R_alloc

> > >  (obviously) allocates and hence can trigger a garbage collection.
> > > 
> > >  Somehow it reflects my approach to PROTECTion, probably not shared 
> > >  by

> > >  everyone. I like to PROTECT only when necessary, rather than
> > >  indiscriminately. Probably this has no practical consequence in
> > >  terms of
> > >  performance, making the code a little easier to read at the expense 
> > >  of

> > >  exposing me to bugs like this.
> > > 
> > 
> >  I guess it's a tradeoff between syntactic complexity and logical

> >  complexity. You have to think pretty hard to minimize use of the
> >  protect stack.
> 
>  I prefer to call it logical obscurity ;-)
> 
>  The hard thinking consists in assessing whether or not the code between

>  the line where a new SEXP is allocated (line X) and the line where
>  it's put in a safe place (line Y) can trigger garbage collection.
>  Hard to figure out in general indeed, but not impossible! Problem
>  is that the result of this assessment is valid at a certain point
>  in time but might change in the future, even if your code has not
>  changed.
> 
>  So a dangerous game for virtually zero benefits.
> 
> > 
> >  One 

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-07 Thread Hervé Pagès

On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:

On Fri, 7 Apr 2017, Hervé Pagès wrote:


On 04/06/2017 03:29 AM, Michael Lawrence wrote:

On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
 wrote:

On 04/06/2017 05:33 AM, Aaron Lun wrote:


The tool is not perfect, so assess each report carefully.


I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.

So I suspect these warnings are false positives but I'm not 100% sure.


If you are not 100% sure then you should protect :-)

There are some cases, in particular related to compact row names on
data frames, where getAttrib will allocate.


Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
anymore? Should I worry that VECTOR_ELT() will also expand some sort
of compact list element? Why not keep these things low-level
getters/setters that return whatever the real thing is and use
higher-level accessors for returning the expanded version of the thing?

Thanks,
H.



Best,

luke






I also get a warning on almost every C++ function I've written,
because
I use the following code to handle exceptions:

 SEXP output=PROTECT(allocVector(...));
 try {
 // do something that might raise an exception
 } catch (std::exception& e) {
 UNPROTECT(1);
 throw; // break out of this part of the function
 }
 UNPROTECT(1);
 return output;

Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.


'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol
could
in theory be an active binding and the value obtained not
PROTECTed by
the name space), and a genuine bug

tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char));  # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too
late!



I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the
R_alloc call looks okay to me...



yes, tag needs protection.

I attributed the bug to R_alloc because I failed to reason that R_alloc
(obviously) allocates and hence can trigger a garbage collection.

Somehow it reflects my approach to PROTECTion, probably not shared by
everyone. I like to PROTECT only when necessary, rather than
indiscriminately. Probably this has no practical consequence in
terms of
performance, making the code a little easier to read at the expense of
exposing me to bugs like this.



I guess it's a tradeoff between syntactic complexity and logical
complexity. You have to think pretty hard to minimize use of the
protect stack.


I prefer to call it logical obscurity ;-)

The hard thinking consists in assessing whether or not the code between
the line where a new SEXP is allocated (line X) and the line where
it's put in a safe place (line Y) can trigger garbage collection.
Hard to figure out in general indeed, but not impossible! Problem
is that the result of this assessment is valid at a certain point
in time but might change in the future, even if your code has not
changed.

So a dangerous game for virtually zero benefits.



One recommendation might be to UNPROTECT() as soon as the pointer on
the top is unneeded, rather than trying to figure out the number to
pop just before returning to R.


If you PROTECT() in a loop, you definitely want to do that. Otherwise,
does it make a big difference?



One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.


I got hit by this too long time ago but with defineVar() instead of
R_setAttrib():

 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html=DwID-g=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw=

H.



Btw, I think my 

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-07 Thread luke-tierney

On Fri, 7 Apr 2017, Hervé Pagès wrote:


On 04/06/2017 03:29 AM, Michael Lawrence wrote:

On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
 wrote:

On 04/06/2017 05:33 AM, Aaron Lun wrote:


The tool is not perfect, so assess each report carefully.


I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.

So I suspect these warnings are false positives but I'm not 100% sure.


If you are not 100% sure then you should protect :-)

There are some cases, in particular related to compact row names on
data frames, where getAttrib will allocate.

Best,

luke






I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:

 SEXP output=PROTECT(allocVector(...));
 try {
 // do something that might raise an exception
 } catch (std::exception& e) {
 UNPROTECT(1);
 throw; // break out of this part of the function
 }
 UNPROTECT(1);
 return output;

Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.


'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug

tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char));  # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!



I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the
R_alloc call looks okay to me...



yes, tag needs protection.

I attributed the bug to R_alloc because I failed to reason that R_alloc
(obviously) allocates and hence can trigger a garbage collection.

Somehow it reflects my approach to PROTECTion, probably not shared by
everyone. I like to PROTECT only when necessary, rather than
indiscriminately. Probably this has no practical consequence in terms of
performance, making the code a little easier to read at the expense of
exposing me to bugs like this.



I guess it's a tradeoff between syntactic complexity and logical
complexity. You have to think pretty hard to minimize use of the
protect stack.


I prefer to call it logical obscurity ;-)

The hard thinking consists in assessing whether or not the code between
the line where a new SEXP is allocated (line X) and the line where
it's put in a safe place (line Y) can trigger garbage collection.
Hard to figure out in general indeed, but not impossible! Problem
is that the result of this assessment is valid at a certain point
in time but might change in the future, even if your code has not
changed.

So a dangerous game for virtually zero benefits.



One recommendation might be to UNPROTECT() as soon as the pointer on
the top is unneeded, rather than trying to figure out the number to
pop just before returning to R.


If you PROTECT() in a loop, you definitely want to do that. Otherwise,
does it make a big difference?



One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.


I got hit by this too long time ago but with defineVar() instead of
R_setAttrib():

 https://stat.ethz.ch/pipermail/r-devel/2008-January/048040.html

H.



Btw, I think my package RGtk2 got the record: 1952 errors. Luckily
almost all of them happened inside a few macros and autogenerated
code.


Martin



Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU=




This email message may contain legally privileged and/or...{{dropped:2}}

___

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-07 Thread Hervé Pagès

On 04/06/2017 03:29 AM, Michael Lawrence wrote:

On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
 wrote:

On 04/06/2017 05:33 AM, Aaron Lun wrote:


The tool is not perfect, so assess each report carefully.


I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.

So I suspect these warnings are false positives but I'm not 100% sure.




I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:

 SEXP output=PROTECT(allocVector(...));
 try {
 // do something that might raise an exception
 } catch (std::exception& e) {
 UNPROTECT(1);
 throw; // break out of this part of the function
 }
 UNPROTECT(1);
 return output;

Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.


'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug

tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char));  # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!



I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the
R_alloc call looks okay to me...



yes, tag needs protection.

I attributed the bug to R_alloc because I failed to reason that R_alloc
(obviously) allocates and hence can trigger a garbage collection.

Somehow it reflects my approach to PROTECTion, probably not shared by
everyone. I like to PROTECT only when necessary, rather than
indiscriminately. Probably this has no practical consequence in terms of
performance, making the code a little easier to read at the expense of
exposing me to bugs like this.



I guess it's a tradeoff between syntactic complexity and logical
complexity. You have to think pretty hard to minimize use of the
protect stack.


I prefer to call it logical obscurity ;-)

The hard thinking consists in assessing whether or not the code between
the line where a new SEXP is allocated (line X) and the line where
it's put in a safe place (line Y) can trigger garbage collection.
Hard to figure out in general indeed, but not impossible! Problem
is that the result of this assessment is valid at a certain point
in time but might change in the future, even if your code has not
changed.

So a dangerous game for virtually zero benefits.



One recommendation might be to UNPROTECT() as soon as the pointer on
the top is unneeded, rather than trying to figure out the number to
pop just before returning to R.


If you PROTECT() in a loop, you definitely want to do that. Otherwise,
does it make a big difference?



One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.


I got hit by this too long time ago but with defineVar() instead of
R_setAttrib():

  https://stat.ethz.ch/pipermail/r-devel/2008-January/048040.html

H.



Btw, I think my package RGtk2 got the record: 1952 errors. Luckily
almost all of them happened inside a few macros and autogenerated
code.


Martin



Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU=




This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list

Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-06 Thread Michael Lawrence
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
 wrote:
> On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>>
>>> The tool is not perfect, so assess each report carefully.
>>
>>
>> I also get a warning on almost every C++ function I've written, because
>> I use the following code to handle exceptions:
>>
>>  SEXP output=PROTECT(allocVector(...));
>>  try {
>>  // do something that might raise an exception
>>  } catch (std::exception& e) {
>>  UNPROTECT(1);
>>  throw; // break out of this part of the function
>>  }
>>  UNPROTECT(1);
>>  return output;
>>
>> Presumably the check doesn't account for transfer of control to the
>> catch block. I find that R itself is pretty good at complaining about
>> stack imbalances during execution of tests, examples, etc.
>>
>>> 'My' packages
>>> (Rsamtools, DirichletMultinomial) had several false positives (all
>>> associated with use of an attribute of a protected SEXP), one subtle
>>> problem (a symbol from a PROTECT'ed package name space; the symbol could
>>> in theory be an active binding and the value obtained not PROTECTed by
>>> the name space), and a genuine bug
>>>
>>> tag = NEW_CHARACTER(n);
>>> for (int j = 0; j < n; ++j)
>>> SET_STRING_ELT(tag, j, NA_STRING);
>>> if ('A' == aux[0]) {
>>> buf_A = R_alloc(2, sizeof(char));  # <<- bug
>>> buf_A[1] = '\0';
>>> }
>>> ...
>>> SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!
>>
>>
>> I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the
>> R_alloc call looks okay to me...
>
>
> yes, tag needs protection.
>
> I attributed the bug to R_alloc because I failed to reason that R_alloc
> (obviously) allocates and hence can trigger a garbage collection.
>
> Somehow it reflects my approach to PROTECTion, probably not shared by
> everyone. I like to PROTECT only when necessary, rather than
> indiscriminately. Probably this has no practical consequence in terms of
> performance, making the code a little easier to read at the expense of
> exposing me to bugs like this.
>

I guess it's a tradeoff between syntactic complexity and logical
complexity. You have to think pretty hard to minimize use of the
protect stack.

One recommendation might be to UNPROTECT() as soon as the pointer on
the top is unneeded, rather than trying to figure out the number to
pop just before returning to R.

One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.

Btw, I think my package RGtk2 got the record: 1952 errors. Luckily
almost all of them happened inside a few macros and autogenerated
code.

> Martin
>
>>
>> Cheers,
>>
>> Aaron
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-06 Thread Martin Morgan

On 04/06/2017 05:33 AM, Aaron Lun wrote:

The tool is not perfect, so assess each report carefully.


I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:

 SEXP output=PROTECT(allocVector(...));
 try {
 // do something that might raise an exception
 } catch (std::exception& e) {
 UNPROTECT(1);
 throw; // break out of this part of the function
 }
 UNPROTECT(1);
 return output;

Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.


'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug

tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char));  # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!


I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the
R_alloc call looks okay to me...


yes, tag needs protection.

I attributed the bug to R_alloc because I failed to reason that R_alloc 
(obviously) allocates and hence can trigger a garbage collection.


Somehow it reflects my approach to PROTECTion, probably not shared by 
everyone. I like to PROTECT only when necessary, rather than 
indiscriminately. Probably this has no practical consequence in terms of 
performance, making the code a little easier to read at the expense of 
exposing me to bugs like this.


Martin



Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-06 Thread Aaron Lun
> The tool is not perfect, so assess each report carefully.

I also get a warning on almost every C++ function I've written, because 
I use the following code to handle exceptions:

 SEXP output=PROTECT(allocVector(...));
 try {
 // do something that might raise an exception
 } catch (std::exception& e) {
 UNPROTECT(1);
 throw; // break out of this part of the function
 }
 UNPROTECT(1);
 return output;

Presumably the check doesn't account for transfer of control to the 
catch block. I find that R itself is pretty good at complaining about 
stack imbalances during execution of tests, examples, etc.

> 'My' packages
> (Rsamtools, DirichletMultinomial) had several false positives (all
> associated with use of an attribute of a protected SEXP), one subtle
> problem (a symbol from a PROTECT'ed package name space; the symbol could
> in theory be an active binding and the value obtained not PROTECTed by
> the name space), and a genuine bug
>
> tag = NEW_CHARACTER(n);
> for (int j = 0; j < n; ++j)
> SET_STRING_ELT(tag, j, NA_STRING);
> if ('A' == aux[0]) {
> buf_A = R_alloc(2, sizeof(char));  # <<- bug
> buf_A[1] = '\0';
> }
> ...
> SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!

I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the 
R_alloc call looks okay to me...

Cheers,

Aaron
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel