Luke,

We're actually adopting the first of your generic approaches.

As a more concrete description:

There are R objects representing survey data sets, with the data stored in
a database table.  The subset() method, when applied to these objects,
creates a new table indicating which rows of the data table are in the
subset -- we don't modify the original table, because that breaks the
call-by-value semantics. When the subset object in R goes out of scope, we
need to delete the extra database table.

 I have been doing this with a finalizer on an environment that's part of
the subset object in R.   This all worked fine with JDBC, but the native
database interface requires that all communications with the database come
in send/receive pairs. Since R is single-threaded, this would normally not
be any issue. However, since garbage collection can happen at any time, it
is possible that the send part of the finalizer query "drop table
_sbs_whatever" comes between the send and receive of some other query, and
the database connection then falls over.   So, I'm happy for the finalizer
to run at any time except during a small critical section of R code.

In this particular case the finalizer only issues "drop table" queries, and
it doesn't need to know if they succeed, so we can keep a lock in the
database connection and just store any "drop table" queries that arrive
during a database operation for later execution.   More generally, though,
the fact that no R operation is atomic with respect to garbage collection
seems to make it a bit difficult to use finalizers -- if you need a
finalizer, it will often be in order to access and free some external
resource, which is when the race conditions can matter.

What I was envisaging was something like

without_gc(expr)

to evaluate expr with the memory manager set to allocate memory (or attempt
to do so) without garbage collection.  Even better would be if gc could
run, but weak references were temporarily treated as strong so that garbage
without finalizers would be collected but finalizers didn't get triggered.
 Using this facility would be inefficient, because it would allocate more
memory than necessary and would also mess with the tuning of the garbage
collector,  but when communicating with other programs it seems it would be
very useful to have some way of running an R code block and knowing that no
other R code block would run during it (user interrupts are another issue,
but they can be caught, and in any case I'm happy to fail when the user
presses CTRL-C).

     -thomas




On Fri, Feb 15, 2013 at 12:53 AM, <luke-tier...@uiowa.edu> wrote:

> It might help if you could be more specific about what the issue is --
> if they are out of scope why does it matter whether the finalizers
> run?
>
> Generically two approaches I can think of:
>
>     you keep track of whenit is safe to fully run your finalizers and have
>     your finalizers put the objects on a linked list if it isn't safe to
>     run the finalizer now and clear the list each time you make a new one
>
>     keep track of your objects with a weak list andturn them into strong
>     references before your calls, then drop the list after.
>
> I'm pretty sure we don't have a mechanism for temporarily suspending
> running the finalizers but it is probably fairly easy to add if that
> is the only option.
>
> I might be able to think of other options with more details on the
> issue.
>
> Best,
>
> luke
>
>
> On Tue, 12 Feb 2013, Thomas Lumley wrote:
>
>  Is there some way to prevent finalizers running during a section of code?
>>
>> I have a package that includes R objects linked to database tables.  To
>> maintain the call-by-value semantics, tables are copied rather than
>> modified, and the extra tables are removed by finalizers during garbage
>> collection.
>>
>> However, if the garbage collection occurs in the middle of processing
>> another SQL query (which is relatively likely, since that's where the
>> memory allocations are) there are problems with the database interface.
>>
>> Since the guarantees for the finalizer are "at most once, not before the
>> object is out of scope" it seems harmless to be able to prevent finalizers
>> from running during a particular code block, but I can't see any way to do
>> it.
>>
>> Suggestions?
>>
>>    -thomas
>>
>>
>>
>>
> --
> Luke Tierney
> Chair, Statistics and Actuarial Science
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>



-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to