Re: [Rd] R object protection [Was: R-devel Digest, Vol 83, Issue 2]

2010-01-03 Thread Romain Francois

On 01/03/2010 01:34 AM, Simon Urbanek wrote:


On Jan 2, 2010, at 4:08 PM, Laurent Gautier wrote:


On 1/2/10 8:28 PM, Simon Urbanek wrote:


On Jan 2, 2010, at 12:17 PM, Laurent Gautier wrote:


On 1/2/10 5:56 PM, Duncan Murdoch wrote:

On 02/01/2010 11:36 AM, Laurent Gautier wrote:

[Disclaimer: what is below reflects my understanding from
reading the R source, others will correct where deemed
necessary]

On 1/2/10 12:00 PM, r-devel-requ...@r-project.org wrote:



(...)


Another possibility is to maintain your own list or environment
of objects, and just protect/preserve the list as a whole.


Interesting idea, this would let one perform his/her own
bookkeeping on the list/environment. How is R garbage collection
checking contained objects ? (I am thinking performances here, and
may be cyclic references).



You don't really want to care because the GC is global for all
objects (and cycles are supported by the GC in R) - so whether you
keep it yourself or Preserve/Release is practically irrelevant (the
protection stack is handled separately).


I guess that I'll have to know in order to understand that I don't really want 
to care. ;-)
The garbage collector must somehow know if an object is available for 
collection (and will have to check whether an object is PROTECTed or not, or 
Preserved or not).
I suppose that upon being called the garbage collector will first look into the 
PROTECTed and Preserved objects, mark them as unavailble for collection, then 
recursively mark objects contained in them.



The GC marks recursively from all known roots of which Preserved list is one of 
many and all elements of the protection stack are treated as such as well (FWIW 
the Preserved and protected list are in that order the last two). Since this 
involves (by definition) all live objects it doesn't matter to which other 
object you assign the node. The only detail is that protection stack does not 
change the generation (since there is no real node to assign to).



As for keeping your own list -- if you really care about performance
that much (to be honest in vast majority of cases it doesn't matter)
you have to be careful how you implement it. Technically the fastest
way is preallocated generic vector but it really depends on how you
deal with the access so you can easily perform worse than
Preserve/Release if you're not careful.


Releasing being of linear complexity, having few thousands of Preserved objects 
not being anticipated as an extraordinary situation, and Preserve/Release 
cycles being quite frequent, I start minding a bit about the performance. 
Keeping my own list would let me experiment with various strategies (and 
eventually offer



Sure - what I meant is that you have to optimize for one thing or the other so 
you have to be careful what you do.



As a side note - the best way (IMHO) to deal with all those issues is
to use external pointers because a) you get very efficient C
finalizers b) you can directly (and very efficiently) tie lifespan of
other objects to the same SEXP and c) as guardians they can nicely
track other objects that hold them.


Thanks. I am not certain to follow everything. Are you suggesting that rather 
than Preserve-ing/Release-ing a list/environment that would act as a guardian 
for several objects, one should use an external pointer (to an arbitrary C 
pointer) ? In that case, how does one indicate that an external pointer acts as 
a container ?

Or are you suggesting that rather than Preserve-in/Release-ing R objects one should use 
an external pointer acting as a proxy for a SEXP (argument prot in 
R_MakeExternalPtr(void *p, SEXP tag, SEXP prot) ) ?
(but in that case the external pointer will itself have to go through 
Preserve/Release cycles...)



I was guessing that you use this in conjunction with some C++ magic not just 
plain R objects and thus you have to deal with two life spans. From the other 
messages I think you are dealing with the simple situation of wrapping an R 
object as reference in the other system with explicit memory management (i.e. 
in C++ you have explicit new/delete life cycle) in which case you really don't 
need anything more than Preserve. It is more interesting when you want to track 
the life of R objects without imposing the life span - i.e when you want to 
know when an object in R is collected such that you can delete it from the 
other system (i.e. you don't explicitly retain it by the reference).

Cheers,
Simon


Many thanks for this clarification. Rcpp is using a jri wanabee 
approach. Essentially we have :


class RObject{
public:
RObject(SEXP x){ preserve x }
~RObject(){ release x}
private:
SEXP x ;
}

For the story, I'm also doing the other way (rJava wanabee) in the CPP 
package 
(http://romainfrancois.blog.free.fr/index.php?post/2009/12/22/CPP-package-%3A-exposing-C-objects) 
that wraps up arbitrary C++ objects (currently stl containers) as 
external pointers. You might recognize some patterns here.


 # create the 

[Rd] R object protection [Was: R-devel Digest, Vol 83, Issue 2]

2010-01-02 Thread Simon Urbanek
On Jan 2, 2010, at 4:08 PM, Laurent Gautier wrote:

 On 1/2/10 8:28 PM, Simon Urbanek wrote:
 
 On Jan 2, 2010, at 12:17 PM, Laurent Gautier wrote:
 
 On 1/2/10 5:56 PM, Duncan Murdoch wrote:
 On 02/01/2010 11:36 AM, Laurent Gautier wrote:
 [Disclaimer: what is below reflects my understanding from
 reading the R source, others will correct where deemed
 necessary]
 
 On 1/2/10 12:00 PM, r-devel-requ...@r-project.org wrote:
 
 (...)
 
 Another possibility is to maintain your own list or environment
 of objects, and just protect/preserve the list as a whole.
 
 Interesting idea, this would let one perform his/her own
 bookkeeping on the list/environment. How is R garbage collection
 checking contained objects ? (I am thinking performances here, and
 may be cyclic references).
 
 
 You don't really want to care because the GC is global for all
 objects (and cycles are supported by the GC in R) - so whether you
 keep it yourself or Preserve/Release is practically irrelevant (the
 protection stack is handled separately).
 
 I guess that I'll have to know in order to understand that I don't really 
 want to care. ;-)
 The garbage collector must somehow know if an object is available for 
 collection (and will have to check whether an object is PROTECTed or not, or 
 Preserved or not).
 I suppose that upon being called the garbage collector will first look into 
 the PROTECTed and Preserved objects, mark them as unavailble for collection, 
 then recursively mark objects contained in them.
 

The GC marks recursively from all known roots of which Preserved list is one of 
many and all elements of the protection stack are treated as such as well (FWIW 
the Preserved and protected list are in that order the last two). Since this 
involves (by definition) all live objects it doesn't matter to which other 
object you assign the node. The only detail is that protection stack does not 
change the generation (since there is no real node to assign to).


 As for keeping your own list -- if you really care about performance
 that much (to be honest in vast majority of cases it doesn't matter)
 you have to be careful how you implement it. Technically the fastest
 way is preallocated generic vector but it really depends on how you
 deal with the access so you can easily perform worse than
 Preserve/Release if you're not careful.
 
 Releasing being of linear complexity, having few thousands of Preserved 
 objects not being anticipated as an extraordinary situation, and 
 Preserve/Release cycles being quite frequent, I start minding a bit about the 
 performance. Keeping my own list would let me experiment with various 
 strategies (and eventually offer
 

Sure - what I meant is that you have to optimize for one thing or the other so 
you have to be careful what you do.


 As a side note - the best way (IMHO) to deal with all those issues is
 to use external pointers because a) you get very efficient C
 finalizers b) you can directly (and very efficiently) tie lifespan of
 other objects to the same SEXP and c) as guardians they can nicely
 track other objects that hold them.
 
 Thanks. I am not certain to follow everything. Are you suggesting that rather 
 than Preserve-ing/Release-ing a list/environment that would act as a guardian 
 for several objects, one should use an external pointer (to an arbitrary C 
 pointer) ? In that case, how does one indicate that an external pointer acts 
 as a container ?
 
 Or are you suggesting that rather than Preserve-in/Release-ing R objects one 
 should use an external pointer acting as a proxy for a SEXP (argument prot 
 in R_MakeExternalPtr(void *p, SEXP tag, SEXP prot) ) ?
 (but in that case the external pointer will itself have to go through 
 Preserve/Release cycles...)
 

I was guessing that you use this in conjunction with some C++ magic not just 
plain R objects and thus you have to deal with two life spans. From the other 
messages I think you are dealing with the simple situation of wrapping an R 
object as reference in the other system with explicit memory management (i.e. 
in C++ you have explicit new/delete life cycle) in which case you really don't 
need anything more than Preserve. It is more interesting when you want to track 
the life of R objects without imposing the life span - i.e when you want to 
know when an object in R is collected such that you can delete it from the 
other system (i.e. you don't explicitly retain it by the reference).

Cheers,
Simon



 
 
 
 HTH,
 
 
 L.
 
 
 
 
 Thanks,
 
 Romain
 
 [1]http://lists.r-forge.r-project.org/pipermail/rcpp-devel/
 [2]
 http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/src/RObject.cpp?rev=255root=rcppview=markup
 
 
 
 
 -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr |- http://tr.im/IW9B : C++
 exceptions at the R level |- http://tr.im/IlMh : CPP package:
 exposing C++ objects `- http://tr.im/HlX9 : new package :
 bibtex