In R 3.5 and later you should not need to gc() -- that should happen
automatically within the connections code.
Nevertheless, I would recommend redesigning your approach to avoid
hanging onto open file connections as these are a scarce resource.
You can keep around your temporary files without having them open and
only open/close them on access, with the close run in an on.exit or a
tryCatch/finally clause.
Best,
luke
On Tue, 7 Aug 2018, Jan van der Laan wrote:
Dear Uwe,
(When replying to your message, I sent the reply to r-devel and not
r-package-devel, as Martin Meachler suggested that this thread would be a
better fit for r-devel.)
Thanks. In the example below I used rm() explicitly, but in general users
wouldn't do that.
One of the reasons for the large number of file handles is that sometimes
unnamed temporary objects are created. For example:
library(ldat)
libraty(lvec)
a <- lvec(10, "integer")
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214753f2af0'
b <- as_rvec(a[1:3])
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec32146a50f383'
OPENFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214484b652c'
print(b)
[1] 0 0 0
gc()
CLOSEFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec3214484b652c'
CLOSEFILE '/tmp/RtmpVqkDsw/file32145169fb06/lvec32146a50f383'
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 796936 42.6 1442291 77.1 1168576 62.5
Vcells 1519523 11.6 4356532 33.3 4740854 36.2
For debugging, I log when files are opened and closed. The call a[1:3] (which
creates a slice of a) creates two temporary objects [1]. These are only
deleted when I explicitly call gc() or on some other random moment in time.
I hope this illustrates the problem better.
Best,
Jan
[1] One improvement would be to create less temporary files; often these
contain only very little information that is better kept in memory. But that
is only a partial solution.
On 07-08-18 15:24, Uwe Ligges wrote:
Why not add functionality that allows to delete object + runs cleanup code?
Best,
Uwe Ligges
On 07.08.2018 14:26, Jan van der Laan wrote:
In my package I open handles to temporary files from c++, handles to them
are returned to R through vptr objects. The files are deleted then the
corresponding R-object is deleted and the garbage collector runs:
a <- lvec(10, "integer")
rm(a)
Then when the garbage collector runs the file is deleted. However, on some
platforms (probably with lower limits on the maximum number of file
handles a process can have open), I run into the problem that the garbage
collector doesn't run often enough. In this case that means that another
package of mine using this package generates an error when its tests are
run.
The simplest solution is to add some calls to gc() in my tests. But a more
general/automatic solution would be nice.
I thought about something in the lines of
robust_lvec <- function(...) {
tryCatch({
lvec(...)
}, error = function(e) {
gc()
lvec(...) # duplicated code
})
}
e.g. try to open a file, when that fails call the garbage collector and
try again. However, this introduces duplicated code (in this case only one
line, but that can be more), and doesn't help if it is another function
that tries to open a file.
Is there a better solution?
Thanks!
Jan
______________________________________________
r-package-de...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel