Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-14 Thread Simon Urbanek
Laurent,


> On Dec 14, 2019, at 5:29 PM, Laurent Gautier  wrote:
> 
> Hi Simon,
> 
> Widespread errors would have caught my earlier as the way that code is
> using only one initialization of the embedded R, is used quite a bit, and
> is covered by quite a few unit tests. This is the only situation I am aware
> of in which an error occurs.
> 

It may or may not be "widespread" - almost all R API functions can raise errors 
(e.g., unable to allocate). You'll only find out once they do and that's too 
late ;).


> What is a "correct context", or initial context, the code should from ?
> Searching for "context" in the R-exts manual does not return much.
> 

It depends which embedded API use - see R-ext 8.1 the two options are 
run_Rmainloop() and R_ReplDLLinit() which both setup the top-level context with 
SETJMP. If you don't use either then you have to use one of the advanced R APIs 
that do it such as R_ToplevelExec() or R_UnwindProtect(), otherwise your point 
to abort to on error doesn't exist. Embedding R is much more complex than many 
think ...

Cheers,
Simon



> Best,
> 
> Laurent
> 
> 
> Le sam. 14 déc. 2019 à 12:20, Simon Urbanek  a
> écrit :
> 
>> Laurent,
>> 
>> the main point here is that ParseVector() just like any other R API has to
>> be called in a correct context since it can raise errors so the issue was
>> that your C code has a bug of not setting R correctly (my guess would be
>> your'e not creating the initial context necessary in embedded R). There are
>> many different errors, your is just one of many that can occur - any R API
>> call that does allocation (and parsing obviously does) can cause errors.
>> Note that this is true for pretty much all R API functions.
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Dec 14, 2019, at 11:25 AM, Laurent Gautier 
>> wrote:
>>> 
>>> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera  a
>>> écrit :
>>> 
 On 12/9/19 2:54 PM, Laurent Gautier wrote:
 
 
 
 Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera 
>> a
 écrit :
 
> On 12/7/19 10:32 PM, Laurent Gautier wrote:
> 
> Thanks for the quick response Tomas.
> 
> The same error is indeed happening when trying to have a zero-length
> variable name in an environment. The surprising bit is then "why is
>> this
> happening during parsing" (that is why are variables assigned to an
> environment) ?
> 
> The emitted R error (in the R console) is not a parse (syntax) error,
>> but
> an error emitted during parsing when the parser tries to intern a name
>> -
> look it up in a symbol table. Empty string is not allowed as a symbol
>> name,
> and hence the error. In the call "list(''=1)" , the empty name is what
> could eventually become a name of a local variable inside list(), even
> though not yet during parsing.
> 
 
 Thanks Tomas.
 
 I guess this has do with R expressions being lazily evaluated, and names
 of arguments in a call are also part of the expression. Now the puzzling
 part is why is that at all part of the parsing: I would have expected
 R_ParseVector() to be restricted to parsing... Now it feels like
 R_ParseVector() is performing parsing, and a first level of evalution
>> for
 expressions that "should never work" (the empty name).
 
 Think of it as an exception in say Python. Some failures during parsing
 result in an exception (called error in R and implemented using a long
 jump). Any time you are calling into R you can get an error; out of
>> memory
 is also signalled as R error.
 
>>> 
>>> 
>>> The surprising bit for me was that I had expected the function to solely
>>> perform parsing. I did expect an exception (and a jmp smashing the stack)
>>> when the function concerned is in the C-API, is parsing a string, and is
>>> using a parameter (pointer) to store whether parsing was a failure or a
>>> success.
>>> 
>>> Since you are making a comparison with Python, the distinction I am
>> making
>>> between parsing and evaluation seem to apply there. For example:
>>> 
>>> ```
>> import parser
>> parser.expr('1+')
>>> Traceback (most recent call last):
>>> File "", line 1, in 
>>> File "", line 1
>>>   1+
>>>^
>>> SyntaxError: unexpected EOF while parsing
>> p = parser.expr('list(""=1)')
>> p
>>> 
>> eval(p)
>>> Traceback (most recent call last):
>>> File "", line 1, in 
>>> TypeError: eval() arg 1 must be a string, bytes or code object
>>> 
>> list(""=1)
>>> File "", line 1
>>> SyntaxError: keyword can't be an expression
>>> ```
>>> 
>>> 
 There is probably some error in how the external code is handling R
> errors  (Fatal error: unable to initialize the JIT, stack smashing,
>> etc)
> and possibly also how R is initialized before calling ParseVector.
>> Probably
> you would get the same problem when running say "stop('myerror')".
>> Please
> note R errors are implemented as long-jumps, so care has to be taken
>> 

Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-14 Thread Laurent Gautier
Hi Simon,

Widespread errors would have caught my earlier as the way that code is
using only one initialization of the embedded R, is used quite a bit, and
is covered by quite a few unit tests. This is the only situation I am aware
of in which an error occurs.

What is a "correct context", or initial context, the code should from ?
Searching for "context" in the R-exts manual does not return much.

Best,

Laurent


Le sam. 14 déc. 2019 à 12:20, Simon Urbanek  a
écrit :

> Laurent,
>
> the main point here is that ParseVector() just like any other R API has to
> be called in a correct context since it can raise errors so the issue was
> that your C code has a bug of not setting R correctly (my guess would be
> your'e not creating the initial context necessary in embedded R). There are
> many different errors, your is just one of many that can occur - any R API
> call that does allocation (and parsing obviously does) can cause errors.
> Note that this is true for pretty much all R API functions.
>
> Cheers,
> Simon
>
>
>
> > On Dec 14, 2019, at 11:25 AM, Laurent Gautier 
> wrote:
> >
> > Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera  a
> > écrit :
> >
> >> On 12/9/19 2:54 PM, Laurent Gautier wrote:
> >>
> >>
> >>
> >> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera 
> a
> >> écrit :
> >>
> >>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
> >>>
> >>> Thanks for the quick response Tomas.
> >>>
> >>> The same error is indeed happening when trying to have a zero-length
> >>> variable name in an environment. The surprising bit is then "why is
> this
> >>> happening during parsing" (that is why are variables assigned to an
> >>> environment) ?
> >>>
> >>> The emitted R error (in the R console) is not a parse (syntax) error,
> but
> >>> an error emitted during parsing when the parser tries to intern a name
> -
> >>> look it up in a symbol table. Empty string is not allowed as a symbol
> name,
> >>> and hence the error. In the call "list(''=1)" , the empty name is what
> >>> could eventually become a name of a local variable inside list(), even
> >>> though not yet during parsing.
> >>>
> >>
> >> Thanks Tomas.
> >>
> >> I guess this has do with R expressions being lazily evaluated, and names
> >> of arguments in a call are also part of the expression. Now the puzzling
> >> part is why is that at all part of the parsing: I would have expected
> >> R_ParseVector() to be restricted to parsing... Now it feels like
> >> R_ParseVector() is performing parsing, and a first level of evalution
> for
> >> expressions that "should never work" (the empty name).
> >>
> >> Think of it as an exception in say Python. Some failures during parsing
> >> result in an exception (called error in R and implemented using a long
> >> jump). Any time you are calling into R you can get an error; out of
> memory
> >> is also signalled as R error.
> >>
> >
> >
> > The surprising bit for me was that I had expected the function to solely
> > perform parsing. I did expect an exception (and a jmp smashing the stack)
> > when the function concerned is in the C-API, is parsing a string, and is
> > using a parameter (pointer) to store whether parsing was a failure or a
> > success.
> >
> > Since you are making a comparison with Python, the distinction I am
> making
> > between parsing and evaluation seem to apply there. For example:
> >
> > ```
>  import parser
>  parser.expr('1+')
> >  Traceback (most recent call last):
> >  File "", line 1, in 
> >  File "", line 1
> >1+
> > ^
> > SyntaxError: unexpected EOF while parsing
>  p = parser.expr('list(""=1)')
>  p
> > 
>  eval(p)
> > Traceback (most recent call last):
> >  File "", line 1, in 
> > TypeError: eval() arg 1 must be a string, bytes or code object
> >
>  list(""=1)
> >  File "", line 1
> > SyntaxError: keyword can't be an expression
> > ```
> >
> >
> >> There is probably some error in how the external code is handling R
> >>> errors  (Fatal error: unable to initialize the JIT, stack smashing,
> etc)
> >>> and possibly also how R is initialized before calling ParseVector.
> Probably
> >>> you would get the same problem when running say "stop('myerror')".
> Please
> >>> note R errors are implemented as long-jumps, so care has to be taken
> when
> >>> calling into R, Writing R Extensions has more details (and section 8
> >>> specifically about embedding R). This is unlike parse (syntax) errors
> >>> signaled via return value to ParseVector()
> >>>
> >>
> >> The issue is that the segfault (because of stack smashing, therefore
> >> because of what also suspected to be an incontrolled jump) is happening
> >> within the execution of R_ParseVector(). I would think that an issue
> with
> >> the initialization of R is less likely because the project is otherwise
> >> used a fair bit and is well covered by automated continuous tests.
> >>
> >> After looking more into R's gram.c I suspect that an execution context
> is
> >> required for R_ParseVector() to know to properly work 

Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-14 Thread Simon Urbanek
Laurent,

the main point here is that ParseVector() just like any other R API has to be 
called in a correct context since it can raise errors so the issue was that 
your C code has a bug of not setting R correctly (my guess would be your'e not 
creating the initial context necessary in embedded R). There are many different 
errors, your is just one of many that can occur - any R API call that does 
allocation (and parsing obviously does) can cause errors. Note that this is 
true for pretty much all R API functions.

Cheers,
Simon



> On Dec 14, 2019, at 11:25 AM, Laurent Gautier  wrote:
> 
> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera  a
> écrit :
> 
>> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>> 
>> 
>> 
>> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera  a
>> écrit :
>> 
>>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>> 
>>> Thanks for the quick response Tomas.
>>> 
>>> The same error is indeed happening when trying to have a zero-length
>>> variable name in an environment. The surprising bit is then "why is this
>>> happening during parsing" (that is why are variables assigned to an
>>> environment) ?
>>> 
>>> The emitted R error (in the R console) is not a parse (syntax) error, but
>>> an error emitted during parsing when the parser tries to intern a name -
>>> look it up in a symbol table. Empty string is not allowed as a symbol name,
>>> and hence the error. In the call "list(''=1)" , the empty name is what
>>> could eventually become a name of a local variable inside list(), even
>>> though not yet during parsing.
>>> 
>> 
>> Thanks Tomas.
>> 
>> I guess this has do with R expressions being lazily evaluated, and names
>> of arguments in a call are also part of the expression. Now the puzzling
>> part is why is that at all part of the parsing: I would have expected
>> R_ParseVector() to be restricted to parsing... Now it feels like
>> R_ParseVector() is performing parsing, and a first level of evalution for
>> expressions that "should never work" (the empty name).
>> 
>> Think of it as an exception in say Python. Some failures during parsing
>> result in an exception (called error in R and implemented using a long
>> jump). Any time you are calling into R you can get an error; out of memory
>> is also signalled as R error.
>> 
> 
> 
> The surprising bit for me was that I had expected the function to solely
> perform parsing. I did expect an exception (and a jmp smashing the stack)
> when the function concerned is in the C-API, is parsing a string, and is
> using a parameter (pointer) to store whether parsing was a failure or a
> success.
> 
> Since you are making a comparison with Python, the distinction I am making
> between parsing and evaluation seem to apply there. For example:
> 
> ```
 import parser
 parser.expr('1+')
>  Traceback (most recent call last):
>  File "", line 1, in 
>  File "", line 1
>1+
> ^
> SyntaxError: unexpected EOF while parsing
 p = parser.expr('list(""=1)')
 p
> 
 eval(p)
> Traceback (most recent call last):
>  File "", line 1, in 
> TypeError: eval() arg 1 must be a string, bytes or code object
> 
 list(""=1)
>  File "", line 1
> SyntaxError: keyword can't be an expression
> ```
> 
> 
>> There is probably some error in how the external code is handling R
>>> errors  (Fatal error: unable to initialize the JIT, stack smashing, etc)
>>> and possibly also how R is initialized before calling ParseVector. Probably
>>> you would get the same problem when running say "stop('myerror')". Please
>>> note R errors are implemented as long-jumps, so care has to be taken when
>>> calling into R, Writing R Extensions has more details (and section 8
>>> specifically about embedding R). This is unlike parse (syntax) errors
>>> signaled via return value to ParseVector()
>>> 
>> 
>> The issue is that the segfault (because of stack smashing, therefore
>> because of what also suspected to be an incontrolled jump) is happening
>> within the execution of R_ParseVector(). I would think that an issue with
>> the initialization of R is less likely because the project is otherwise
>> used a fair bit and is well covered by automated continuous tests.
>> 
>> After looking more into R's gram.c I suspect that an execution context is
>> required for R_ParseVector() to know to properly work (know where to jump
>> in case of error) when the parsing code decides to fail outside what it
>> thinks is a syntax error. If the case, this would make R_ParseVector()
>> function well when called from say, a C-extension to an R package, but fail
>> the way I am seeing it fail when called from an embedded R.
>> 
>> Yes, contexts are used internally to handle errors. For external use
>> please see Writing R Extensions, section 6.12.
>> 
> 
> I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this
> is seems to help me overcome the issue. Thanks for the pointer.
> 
> Best,
> 
> 
> Laurent
> 
> 
>> Best
>> Tomas
>> 
>> 
>> Best,
>> 
>> Laurent
>> 
>>> Best,
>>> Tomas

Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-14 Thread Laurent Gautier
Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera  a
écrit :

> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>
>
>
> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera  a
> écrit :
>
>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>
>> Thanks for the quick response Tomas.
>>
>> The same error is indeed happening when trying to have a zero-length
>> variable name in an environment. The surprising bit is then "why is this
>> happening during parsing" (that is why are variables assigned to an
>> environment) ?
>>
>> The emitted R error (in the R console) is not a parse (syntax) error, but
>> an error emitted during parsing when the parser tries to intern a name -
>> look it up in a symbol table. Empty string is not allowed as a symbol name,
>> and hence the error. In the call "list(''=1)" , the empty name is what
>> could eventually become a name of a local variable inside list(), even
>> though not yet during parsing.
>>
>
> Thanks Tomas.
>
> I guess this has do with R expressions being lazily evaluated, and names
> of arguments in a call are also part of the expression. Now the puzzling
> part is why is that at all part of the parsing: I would have expected
> R_ParseVector() to be restricted to parsing... Now it feels like
> R_ParseVector() is performing parsing, and a first level of evalution for
> expressions that "should never work" (the empty name).
>
> Think of it as an exception in say Python. Some failures during parsing
> result in an exception (called error in R and implemented using a long
> jump). Any time you are calling into R you can get an error; out of memory
> is also signalled as R error.
>


The surprising bit for me was that I had expected the function to solely
perform parsing. I did expect an exception (and a jmp smashing the stack)
when the function concerned is in the C-API, is parsing a string, and is
using a parameter (pointer) to store whether parsing was a failure or a
success.

Since you are making a comparison with Python, the distinction I am making
between parsing and evaluation seem to apply there. For example:

```
>>> import parser
>>> parser.expr('1+')
  Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1
1+
 ^
SyntaxError: unexpected EOF while parsing
>>> p = parser.expr('list(""=1)')
>>> p

>>> eval(p)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: eval() arg 1 must be a string, bytes or code object

>>> list(""=1)
  File "", line 1
SyntaxError: keyword can't be an expression
```


> There is probably some error in how the external code is handling R
>> errors  (Fatal error: unable to initialize the JIT, stack smashing, etc)
>> and possibly also how R is initialized before calling ParseVector. Probably
>> you would get the same problem when running say "stop('myerror')". Please
>> note R errors are implemented as long-jumps, so care has to be taken when
>> calling into R, Writing R Extensions has more details (and section 8
>> specifically about embedding R). This is unlike parse (syntax) errors
>> signaled via return value to ParseVector()
>>
>
> The issue is that the segfault (because of stack smashing, therefore
> because of what also suspected to be an incontrolled jump) is happening
> within the execution of R_ParseVector(). I would think that an issue with
> the initialization of R is less likely because the project is otherwise
> used a fair bit and is well covered by automated continuous tests.
>
> After looking more into R's gram.c I suspect that an execution context is
> required for R_ParseVector() to know to properly work (know where to jump
> in case of error) when the parsing code decides to fail outside what it
> thinks is a syntax error. If the case, this would make R_ParseVector()
> function well when called from say, a C-extension to an R package, but fail
> the way I am seeing it fail when called from an embedded R.
>
> Yes, contexts are used internally to handle errors. For external use
> please see Writing R Extensions, section 6.12.
>

I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this
is seems to help me overcome the issue. Thanks for the pointer.

Best,


Laurent


> Best
> Tomas
>
>
> Best,
>
> Laurent
>
>> Best,
>> Tomas
>>
>>
>> We are otherwise aware that the error is not occurring in the R console,
>> but can be traced to a call to R_ParseVector() in R's C API:(
>> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
>> ).
>>
>> Our specific setup is calling an embedded R from Python, using the cffi
>> library. An error on end was the first possibility considered, but the
>> puzzling specificity of the error (as shown below other parsing errors are
>> handled properly) and the difficulty tracing what is in happening in
>> R_ParseVector() made me ask whether someone on this list had a suggestion
>> about the possible issue"
>>
>> ```
>>
>> >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") 
>> >>> 

Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-09 Thread Tomas Kalibera
On 12/9/19 2:54 PM, Laurent Gautier wrote:
>
>
> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera  > a écrit :
>
> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>> Thanks for the quick response Tomas.
>>
>> The same error is indeed happening when trying to have a
>> zero-length variable name in an environment. The surprising bit
>> is then "why is this happening during parsing" (that is why are
>> variables assigned to an environment) ?
>
> The emitted R error (in the R console) is not a parse (syntax)
> error, but an error emitted during parsing when the parser tries
> to intern a name - look it up in a symbol table. Empty string is
> not allowed as a symbol name, and hence the error. In the call
> "list(''=1)" , the empty name is what could eventually become a
> name of a local variable inside list(), even though not yet during
> parsing.
>
>
> Thanks Tomas.
>
> I guess this has do with R expressions being lazily evaluated, and 
> names of arguments in a call are also part of the expression. Now the 
> puzzling part is why is that at all part of the parsing: I would have 
> expected R_ParseVector() to be restricted to parsing... Now it feels 
> like R_ParseVector() is performing parsing, and a first level of 
> evalution for expressions that "should never work" (the empty name).
Think of it as an exception in say Python. Some failures during parsing 
result in an exception (called error in R and implemented using a long 
jump). Any time you are calling into R you can get an error; out of 
memory is also signalled as R error.
>
> There is probably some error in how the external code is handling
> R errors  (Fatal error: unable to initialize the JIT, stack
> smashing, etc) and possibly also how R is initialized before
> calling ParseVector. Probably you would get the same problem when
> running say "stop('myerror')". Please note R errors are
> implemented as long-jumps, so care has to be taken when calling
> into R, Writing R Extensions has more details (and section 8
> specifically about embedding R). This is unlike parse (syntax)
> errors signaled via return value to ParseVector()
>
>
> The issue is that the segfault (because of stack smashing, therefore 
> because of what also suspected to be an incontrolled jump) is 
> happening within the execution of R_ParseVector(). I would think that 
> an issue with the initialization of R is less likely because the 
> project is otherwise used a fair bit and is well covered by automated 
> continuous tests.
>
> After looking more into R's gram.c I suspect that an execution context 
> is required for R_ParseVector() to know to properly work (know where 
> to jump in case of error) when the parsing code decides to fail 
> outside what it thinks is a syntax error. If the case, this would make 
> R_ParseVector() function well when called from say, a C-extension to 
> an R package, but fail the way I am seeing it fail when called from an 
> embedded R.

Yes, contexts are used internally to handle errors. For external use 
please see Writing R Extensions, section 6.12.

Best
Tomas

> Best,
>
> Laurent
>
> Best,
> Tomas
>
>>
>> We are otherwise aware that the error is not occurring in the R
>> console, but can be traced to a call to R_ParseVector() in R's C
>> 
>> API:(https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509).
>>
>> Our specific setup is calling an embedded R from Python, using
>> the cffi library. An error on end was the first possibility
>> considered, but the puzzling specificity of the error (as shown
>> below other parsing errors are handled properly) and the
>> difficulty tracing what is in happening in R_ParseVector() made
>> me ask whether someone on this list had a suggestion about the
>> possible issue"
>>
>> ```
>> >>>  import  rpy2.rinterface  as  ri
>> >>>  ri.initr()
>> >>>  e  =  ri.parse("list(''=1+")  
>> 
>> ---
>> RParsingError  Traceback  (most  recent  
>> call  last)>>> e = ri.parse("list(''=123") R[write to console]: Error:
>> attempt to use zero-length variable name R[write to console]:
>> Fatal error: unable to initialize the JIT *** stack smashing
>> detected ***:  terminated ```
>>
>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera
>> mailto:tomas.kalib...@gmail.com>> a
>> écrit :
>>
>> Dear Laurent,
>>
>> could you please provide a complete reproducible example
>> where parsing
>> results in a crash of R? Calling parse(text="list(''=123")
>> from R works
>> fine for me (gives Error: attempt to use zero-length variable
>> name).
>>
>> I don't think the problem you observed could be related to
>> the memory
>> leak. The 

Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-09 Thread Laurent Gautier
Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera  a
écrit :

> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>
> Thanks for the quick response Tomas.
>
> The same error is indeed happening when trying to have a zero-length
> variable name in an environment. The surprising bit is then "why is this
> happening during parsing" (that is why are variables assigned to an
> environment) ?
>
> The emitted R error (in the R console) is not a parse (syntax) error, but
> an error emitted during parsing when the parser tries to intern a name -
> look it up in a symbol table. Empty string is not allowed as a symbol name,
> and hence the error. In the call "list(''=1)" , the empty name is what
> could eventually become a name of a local variable inside list(), even
> though not yet during parsing.
>

Thanks Tomas.

I guess this has do with R expressions being lazily evaluated, and names of
arguments in a call are also part of the expression. Now the puzzling part
is why is that at all part of the parsing: I would have expected
R_ParseVector() to be restricted to parsing... Now it feels like
R_ParseVector() is performing parsing, and a first level of evalution for
expressions that "should never work" (the empty name).

There is probably some error in how the external code is handling R errors
> (Fatal error: unable to initialize the JIT, stack smashing, etc) and
> possibly also how R is initialized before calling ParseVector. Probably you
> would get the same problem when running say "stop('myerror')". Please note
> R errors are implemented as long-jumps, so care has to be taken when
> calling into R, Writing R Extensions has more details (and section 8
> specifically about embedding R). This is unlike parse (syntax) errors
> signaled via return value to ParseVector()
>

The issue is that the segfault (because of stack smashing, therefore
because of what also suspected to be an incontrolled jump) is happening
within the execution of R_ParseVector(). I would think that an issue with
the initialization of R is less likely because the project is otherwise
used a fair bit and is well covered by automated continuous tests.

After looking more into R's gram.c I suspect that an execution context is
required for R_ParseVector() to know to properly work (know where to jump
in case of error) when the parsing code decides to fail outside what it
thinks is a syntax error. If the case, this would make R_ParseVector()
function well when called from say, a C-extension to an R package, but fail
the way I am seeing it fail when called from an embedded R.

Best,

Laurent

> Best,
> Tomas
>
>
> We are otherwise aware that the error is not occurring in the R console,
> but can be traced to a call to R_ParseVector() in R's C API:(
> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
> ).
>
> Our specific setup is calling an embedded R from Python, using the cffi
> library. An error on end was the first possibility considered, but the
> puzzling specificity of the error (as shown below other parsing errors are
> handled properly) and the difficulty tracing what is in happening in
> R_ParseVector() made me ask whether someone on this list had a suggestion
> about the possible issue"
>
> ```
>
> >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") 
> >>> ---RParsingError
> >>>  Traceback (most recent call last)>>> e = 
> >>> ri.parse("list(''=123") R[write to console]: Error: attempt to use 
> >>> zero-length variable name
> R[write to console]: Fatal error: unable to initialize the JIT
>
> *** stack smashing detected ***:  terminated
> ```
>
>
> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera  a
> écrit :
>
>> Dear Laurent,
>>
>> could you please provide a complete reproducible example where parsing
>> results in a crash of R? Calling parse(text="list(''=123") from R works
>> fine for me (gives Error: attempt to use zero-length variable name).
>>
>> I don't think the problem you observed could be related to the memory
>> leak. The leak is on the heap, not stack.
>>
>> Zero-length names of elements in a list are allowed. They are not the
>> same thing as zero-length variables in an environment. If you try to
>> convert "lst" from your example to an environment, you would get the
>> error (attempt to use zero-length variable name).
>>
>> Best
>> Tomas
>>
>>
>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>> > Hi again,
>> >
>> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
>> > zero-length named elements does not seem consistent either:
>> >
>> > ```
>> >> lst <- list()
>> >> lst[[""]] <- 1
>> >> names(lst)
>> > [1] ""
>> >> list("" = 1)
>> > Error: attempt to use zero-length variable name
>> > ```
>> >
>> > Should the parser be made to accept as valid what is otherwise possible
>> > when using `[[<` ?
>> >
>> >
>> > Best,
>> >
>> > Laurent
>> >
>> >
>> >
>> > Le sam. 30 

Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-09 Thread Tomas Kalibera
On 12/7/19 10:32 PM, Laurent Gautier wrote:
> Thanks for the quick response Tomas.
>
> The same error is indeed happening when trying to have a zero-length 
> variable name in an environment. The surprising bit is then "why is 
> this happening during parsing" (that is why are variables assigned to 
> an environment) ?

The emitted R error (in the R console) is not a parse (syntax) error, 
but an error emitted during parsing when the parser tries to intern a 
name - look it up in a symbol table. Empty string is not allowed as a 
symbol name, and hence the error. In the call "list(''=1)" , the empty 
name is what could eventually become a name of a local variable inside 
list(), even though not yet during parsing.

There is probably some error in how the external code is handling R 
errors  (Fatal error: unable to initialize the JIT, stack smashing, etc) 
and possibly also how R is initialized before calling ParseVector. 
Probably you would get the same problem when running say 
"stop('myerror')". Please note R errors are implemented as long-jumps, 
so care has to be taken when calling into R, Writing R Extensions has 
more details (and section 8 specifically about embedding R). This is 
unlike parse (syntax) errors signaled via return value to ParseVector()

Best,
Tomas

>
> We are otherwise aware that the error is not occurring in the R 
> console, but can be traced to a call to R_ParseVector() in R's C 
> API:(https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509).
>
> Our specific setup is calling an embedded R from Python, using the 
> cffi library. An error on end was the first possibility considered, 
> but the puzzling specificity of the error (as shown below other 
> parsing errors are handled properly) and the difficulty tracing what 
> is in happening in R_ParseVector() made me ask whether someone on this 
> list had a suggestion about the possible issue"
>
> ```
> >>>  import  rpy2.rinterface  as  ri
> >>>  ri.initr()
> >>>  e  =  ri.parse("list(''=1+")  
> ---
> RParsingError  Traceback  (most  recent  call  
> last)>>> e = ri.parse("list(''=123") R[write to console]: Error: 
> attempt to use zero-length variable name R[write to console]: Fatal 
> error: unable to initialize the JIT *** stack smashing detected ***: 
>  terminated ```
>
> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera  > a écrit :
>
> Dear Laurent,
>
> could you please provide a complete reproducible example where
> parsing
> results in a crash of R? Calling parse(text="list(''=123") from R
> works
> fine for me (gives Error: attempt to use zero-length variable name).
>
> I don't think the problem you observed could be related to the memory
> leak. The leak is on the heap, not stack.
>
> Zero-length names of elements in a list are allowed. They are not the
> same thing as zero-length variables in an environment. If you try to
> convert "lst" from your example to an environment, you would get the
> error (attempt to use zero-length variable name).
>
> Best
> Tomas
>
>
> On 11/30/19 11:55 PM, Laurent Gautier wrote:
> > Hi again,
> >
> > Beside R_ParseVector()'s possible inconsistent behavior, R's
> handling of
> > zero-length named elements does not seem consistent either:
> >
> > ```
> >> lst <- list()
> >> lst[[""]] <- 1
> >> names(lst)
> > [1] ""
> >> list("" = 1)
> > Error: attempt to use zero-length variable name
> > ```
> >
> > Should the parser be made to accept as valid what is otherwise
> possible
> > when using `[[<` ?
> >
> >
> > Best,
> >
> > Laurent
> >
> >
> >
> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier
> mailto:lgaut...@gmail.com>> a écrit :
> >
> >> I found the following code comment in `src/main/gram.c`:
> >>
> >> ```
> >>
> >> /* Memory leak
> >>
> >> yyparse(), as generated by bison, allocates extra space for the
> parser
> >> stack using malloc(). Unfortunately this means that there is a
> memory
> >> leak in case of an R error (long-jump). In principle, we could
> define
> >> yyoverflow() to relocate the parser stacks for bison and
> allocate say on
> >> the R heap, but yyoverflow() is undocumented and somewhat
> complicated
> >> (we would have to replicate some macros from the generated
> parser here).
> >> The same problem exists at least in the Rd and LaTeX parsers in
> tools.
> >> */
> >>
> >> ```
> >>
> >> Could this be related to be issue ?
> >>
> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier
> mailto:lgaut...@gmail.com>> a
> >> écrit :
> >>
> >>> Hi,
> >>>
> >>> The behavior of
> >>> ```
> >>> SEXP R_ParseVector(SEXP, int, 

Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-07 Thread Laurent Gautier
Thanks for the quick response Tomas.

The same error is indeed happening when trying to have a zero-length
variable name in an environment. The surprising bit is then "why is this
happening during parsing" (that is why are variables assigned to an
environment) ?

We are otherwise aware that the error is not occurring in the R console,
but can be traced to a call to R_ParseVector() in R's C API:(
https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
).

Our specific setup is calling an embedded R from Python, using the cffi
library. An error on end was the first possibility considered, but the
puzzling specificity of the error (as shown below other parsing errors are
handled properly) and the difficulty tracing what is in happening in
R_ParseVector() made me ask whether someone on this list had a suggestion
about the possible issue"

```

>>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") 
>>> ---RParsingError
>>>  Traceback (most recent call last)
>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use 
>>> zero-length variable name
R[write to console]: Fatal error: unable to initialize the JIT

*** stack smashing detected ***:  terminated
```


Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera  a
écrit :

> Dear Laurent,
>
> could you please provide a complete reproducible example where parsing
> results in a crash of R? Calling parse(text="list(''=123") from R works
> fine for me (gives Error: attempt to use zero-length variable name).
>
> I don't think the problem you observed could be related to the memory
> leak. The leak is on the heap, not stack.
>
> Zero-length names of elements in a list are allowed. They are not the
> same thing as zero-length variables in an environment. If you try to
> convert "lst" from your example to an environment, you would get the
> error (attempt to use zero-length variable name).
>
> Best
> Tomas
>
>
> On 11/30/19 11:55 PM, Laurent Gautier wrote:
> > Hi again,
> >
> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
> > zero-length named elements does not seem consistent either:
> >
> > ```
> >> lst <- list()
> >> lst[[""]] <- 1
> >> names(lst)
> > [1] ""
> >> list("" = 1)
> > Error: attempt to use zero-length variable name
> > ```
> >
> > Should the parser be made to accept as valid what is otherwise possible
> > when using `[[<` ?
> >
> >
> > Best,
> >
> > Laurent
> >
> >
> >
> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier  a
> écrit :
> >
> >> I found the following code comment in `src/main/gram.c`:
> >>
> >> ```
> >>
> >> /* Memory leak
> >>
> >> yyparse(), as generated by bison, allocates extra space for the parser
> >> stack using malloc(). Unfortunately this means that there is a memory
> >> leak in case of an R error (long-jump). In principle, we could define
> >> yyoverflow() to relocate the parser stacks for bison and allocate say on
> >> the R heap, but yyoverflow() is undocumented and somewhat complicated
> >> (we would have to replicate some macros from the generated parser here).
> >> The same problem exists at least in the Rd and LaTeX parsers in tools.
> >> */
> >>
> >> ```
> >>
> >> Could this be related to be issue ?
> >>
> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier  a
> >> écrit :
> >>
> >>> Hi,
> >>>
> >>> The behavior of
> >>> ```
> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
> >>> ```
> >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
> >>> depending on the string to be parsed.
> >>>
> >>> Trying to parse a string such as `"list(''=1+"` sets the
> >>> `ParseStatus` to incomplete parsing error but trying to parse
> >>> `"list(''=123"` will result in R sending a message to the console
> (followed but a crash):
> >>>
> >>> ```
> >>> R[write to console]: Error: attempt to use zero-length variable
> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
> smashing detected ***:  terminated
> >>> ```
> >>>
> >>> Is there a reason for the difference in behavior, and is there a
> workaround ?
> >>>
> >>> Thanks,
> >>>
> >>>
> >>> Laurent
> >>>
> >>>
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-12-02 Thread Tomas Kalibera

Dear Laurent,

could you please provide a complete reproducible example where parsing 
results in a crash of R? Calling parse(text="list(''=123") from R works 
fine for me (gives Error: attempt to use zero-length variable name).


I don't think the problem you observed could be related to the memory 
leak. The leak is on the heap, not stack.


Zero-length names of elements in a list are allowed. They are not the 
same thing as zero-length variables in an environment. If you try to 
convert "lst" from your example to an environment, you would get the 
error (attempt to use zero-length variable name).


Best
Tomas


On 11/30/19 11:55 PM, Laurent Gautier wrote:

Hi again,

Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
zero-length named elements does not seem consistent either:

```

lst <- list()
lst[[""]] <- 1
names(lst)

[1] ""

list("" = 1)

Error: attempt to use zero-length variable name
```

Should the parser be made to accept as valid what is otherwise possible
when using `[[<` ?


Best,

Laurent



Le sam. 30 nov. 2019 à 17:33, Laurent Gautier  a écrit :


I found the following code comment in `src/main/gram.c`:

```

/* Memory leak

yyparse(), as generated by bison, allocates extra space for the parser
stack using malloc(). Unfortunately this means that there is a memory
leak in case of an R error (long-jump). In principle, we could define
yyoverflow() to relocate the parser stacks for bison and allocate say on
the R heap, but yyoverflow() is undocumented and somewhat complicated
(we would have to replicate some macros from the generated parser here).
The same problem exists at least in the Rd and LaTeX parsers in tools.
*/

```

Could this be related to be issue ?

Le sam. 30 nov. 2019 à 14:04, Laurent Gautier  a
écrit :


Hi,

The behavior of
```
SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
```
defined in `src/include/R_ext/Parse.h` appears to be inconsistent
depending on the string to be parsed.

Trying to parse a string such as `"list(''=1+"` sets the
`ParseStatus` to incomplete parsing error but trying to parse
`"list(''=123"` will result in R sending a message to the console (followed but 
a crash):

```
R[write to console]: Error: attempt to use zero-length variable nameR[write to 
console]: Fatal error: unable to initialize the JIT*** stack smashing detected ***: 
 terminated
```

Is there a reason for the difference in behavior, and is there a workaround ?

Thanks,


Laurent



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-11-30 Thread Laurent Gautier
Hi again,

Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
zero-length named elements does not seem consistent either:

```
> lst <- list()
> lst[[""]] <- 1
> names(lst)
[1] ""
> list("" = 1)
Error: attempt to use zero-length variable name
```

Should the parser be made to accept as valid what is otherwise possible
when using `[[<` ?


Best,

Laurent



Le sam. 30 nov. 2019 à 17:33, Laurent Gautier  a écrit :

> I found the following code comment in `src/main/gram.c`:
>
> ```
>
> /* Memory leak
>
> yyparse(), as generated by bison, allocates extra space for the parser
> stack using malloc(). Unfortunately this means that there is a memory
> leak in case of an R error (long-jump). In principle, we could define
> yyoverflow() to relocate the parser stacks for bison and allocate say on
> the R heap, but yyoverflow() is undocumented and somewhat complicated
> (we would have to replicate some macros from the generated parser here).
> The same problem exists at least in the Rd and LaTeX parsers in tools.
> */
>
> ```
>
> Could this be related to be issue ?
>
> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier  a
> écrit :
>
>> Hi,
>>
>> The behavior of
>> ```
>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>> ```
>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>> depending on the string to be parsed.
>>
>> Trying to parse a string such as `"list(''=1+"` sets the
>> `ParseStatus` to incomplete parsing error but trying to parse
>> `"list(''=123"` will result in R sending a message to the console (followed 
>> but a crash):
>>
>> ```
>> R[write to console]: Error: attempt to use zero-length variable nameR[write 
>> to console]: Fatal error: unable to initialize the JIT*** stack smashing 
>> detected ***:  terminated
>> ```
>>
>> Is there a reason for the difference in behavior, and is there a workaround ?
>>
>> Thanks,
>>
>>
>> Laurent
>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-11-30 Thread Laurent Gautier
I found the following code comment in `src/main/gram.c`:

```

/* Memory leak

yyparse(), as generated by bison, allocates extra space for the parser
stack using malloc(). Unfortunately this means that there is a memory
leak in case of an R error (long-jump). In principle, we could define
yyoverflow() to relocate the parser stacks for bison and allocate say on
the R heap, but yyoverflow() is undocumented and somewhat complicated
(we would have to replicate some macros from the generated parser here).
The same problem exists at least in the Rd and LaTeX parsers in tools.
*/

```

Could this be related to be issue ?

Le sam. 30 nov. 2019 à 14:04, Laurent Gautier  a écrit :

> Hi,
>
> The behavior of
> ```
> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
> ```
> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
> depending on the string to be parsed.
>
> Trying to parse a string such as `"list(''=1+"` sets the
> `ParseStatus` to incomplete parsing error but trying to parse
> `"list(''=123"` will result in R sending a message to the console (followed 
> but a crash):
>
> ```
> R[write to console]: Error: attempt to use zero-length variable nameR[write 
> to console]: Fatal error: unable to initialize the JIT*** stack smashing 
> detected ***:  terminated
> ```
>
> Is there a reason for the difference in behavior, and is there a workaround ?
>
> Thanks,
>
>
> Laurent
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

2019-11-30 Thread Laurent Gautier
Hi,

The behavior of
```
SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
```
defined in `src/include/R_ext/Parse.h` appears to be inconsistent depending
on the string to be parsed.

Trying to parse a string such as `"list(''=1+"` sets the
`ParseStatus` to incomplete parsing error but trying to parse
`"list(''=123"` will result in R sending a message to the console
(followed but a crash):

```
R[write to console]: Error: attempt to use zero-length variable
nameR[write to console]: Fatal error: unable to initialize the JIT***
stack smashing detected ***:  terminated
```

Is there a reason for the difference in behavior, and is there a workaround ?

Thanks,


Laurent

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel