Thank you Luke for looking into it. Your knowledge of gc is definitely helpful here. I put comments inline below.
Best, Jan On Wed, Nov 25, 2020 at 10:38 PM <luke-tier...@uiowa.edu> wrote: > > On Tue, 24 Nov 2020, Jan Gorecki wrote: > > > As for other calls to system. I avoid calling system. In the past I > > had some (to get memory stats from OS), but they were failing with > > exactly the same issue. So yes, if I would add call to system before > > calling quit, I believe it would fail with the same error. > > At the same time I think (although I am not sure) that new allocations > > made in R are working fine. So R seems to reserve some memory and can > > continue to operate, while external call like system will fail. Maybe > > it is like this by design, don't know. > > Thanks for the report on quit(). We're exploring how to make the > cleanup on exit more robust to low memory situations like these. > > > > > Aside from this problem that is easy to report due to the warning > > message, I think that gc() is choking at the same time. I tried to > > make reproducible example for that, multiple times but couldn't, let > > me try one more time. > > It happens to manifest when there is 4e8+ unique characters/factors in > > an R session. I am able to reproduce it using data.table and dplyr > > (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy > > because of the size. I described briefly problem in: > > https://github.com/h2oai/db-benchmark/issues/110 > > Because of the design of R's character vectors, with each element > allocated separately, R is never going to be great at handling huge > numbers of distinct strings. But it can do an adequate job given > enough memory to work with. > > When I run your GitHub issue example on a machine with around 500 Gb > of RAM it seems to run OK; /usr/bin/time reports > > 2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata > 92180796maxresident)k > 0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps > > So the memory footprint is quite large. Using gc.time() it looks like > about 1/3 of the time is in GC. Not ideal, and maybe could be improved > on a bit, but probably not by much. The GC is basically doing an > adequate job, given enough RAM. Agree, 1/3 is a lot but still acceptable. So this strictly is not something that requires intervention. PS. I wasn't aware of gc.time(), it may be worth linking it from SeeAlso in gc() manual. > > If you run this example on a system without enough RAM, or with other > programs competing for RAM, you are likely to end up fighting with > your OS/hardware's virtual memory system. When I try to run it on a > 16Gb system it churns for an hour or so before getting killed, and > /usr/bin/time reports a huge number of page faults: > > 312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps > > You are probably experiencing something similar. Yes, this is exactly what I am experiencing. The machine is a bare metal machine of 128GB mem, csv size 50GB, data.frame size 74GB. In my case it churns for ~3h before it gets killed with SIGINT from the parent R process which uses 3h as a timeout for this script. This is something I would like to be addressed because gc time is far bigger than actual computation time. This is not really acceptable, I would prefer to raise an exception instead. > > There may be opportunities for more tuning of the GC to better handle > running this close to memory limits, but I doubt the payoff would be > worth the effort. If you don't have plans/time to work on that anytime soon, then I can fill bugzilla for this problem so it won't get lost in the mailing list. > > Best, > > luke > > > It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print > > even more information about gc, like how much time the each gc() > > process took, how many objects it has to check on each level. > > > > Best regards, > > Jan > > > > > > > > On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera <tomas.kalib...@gmail.com> > > wrote: > >> > >> On 11/24/20 11:27 AM, Jan Gorecki wrote: > >>> Thanks Bill for checking that. > >>> It was my impression that warnings are raised from some internal > >>> system calls made when quitting R. At that point I don't have much > >>> control over checking the return status of those. > >>> Your suggestion looks good to me. > >>> > >>> Tomas, do you think this could help? could this be implemented? > >> > >> I think this is a good suggestion. Deleting files on Unix was changed > >> from system("rm") to doing that in C, and deleting the session directory > >> should follow. > >> > >> It might also help diagnosing your problem, but I don't think it would > >> solve it. If the diagnostics in R works fine and the OS was so > >> hopelessly out of memory that it couldn't run any more external > >> processes, then really this is not a problem of R, but of having > >> exhausted the resources. And it would be a coincidence that just this > >> particular call to "system" at the end of the session did not work. > >> Anything else could break as well close to the end of the script. This > >> seems the most likely explanation to me. > >> > >> Do you get this warning repeatedly, reproducibly at least in slightly > >> different scripts at the very end, with this warning always from quit()? > >> So that the "call" part of the warning message has .Internal(quit) like > >> in the case you posted? Would adding another call to "system" before the > >> call to "q()" work - with checking the return value? If it is always > >> only the last call to "system" in "q()", then it is suspicious, perhaps > >> an indication that some diagnostics in R is not correct. In that case, a > >> reproducible example would be the key - so either if you could diagnose > >> on your end what is the problem, or create a reproducible example that > >> someone else can use to reproduce and debug. > >> > >> Best > >> Tomas > >> > >>> > >>> On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap <williamwdun...@gmail.com> > >>> wrote: > >>>> The call to system() probably is an internal call used to delete the > >>>> session's tempdir(). This sort of failure means that a potentially > >>>> large amount of disk space is not being recovered when R is done. > >>>> Perhaps R_CleanTempDir() could call R_unlink() instead of having a > >>>> subprocess call 'rm -rf ...'. Then it could also issue a specific > >>>> warning if it was impossible to delete all of tempdir(). (That should > >>>> be very rare.) > >>>> > >>>>> q("no") > >>>> Breakpoint 1, R_system (command=command@entry=0x7fffffffa1e0 "rm -Rf > >>>> /tmp/RtmppoKPXb") at sysutils.c:311 > >>>> 311 { > >>>> (gdb) where > >>>> #0 R_system (command=command@entry=0x7fffffffa1e0 "rm -Rf > >>>> /tmp/RtmppoKPXb") at sysutils.c:311 > >>>> #1 0x00005555557c30ec in R_CleanTempDir () at sys-std.c:1178 > >>>> #2 0x00005555557c31d7 in Rstd_CleanUp (saveact=<optimized out>, > >>>> status=0, runLast=<optimized out>) at sys-std.c:1243 > >>>> #3 0x00005555557c593d in R_CleanUp (saveact=saveact@entry=SA_NOSAVE, > >>>> status=status@entry=0, runLast=<optimized out>) at system.c:87 > >>>> #4 0x00005555556cc85e in do_quit (call=<optimized out>, op=<optimized > >>>> out>, args=0x555557813f90, rho=<optimized out>) at main.c:1393 > >>>> > >>>> -Bill > >>>> > >>>> On Mon, Nov 23, 2020 at 3:15 AM Tomas Kalibera > >>>> <tomas.kalib...@gmail.com> wrote: > >>>>> On 11/21/20 6:51 PM, Jan Gorecki wrote: > >>>>>> Dear R-developers, > >>>>>> > >>>>>> Some of the more fat scripts (50+ GB mem used by R) that I am running, > >>>>>> when they finish they do quit with q("no", status=0) > >>>>>> Quite often it happens that there is an extra stderr output produced > >>>>>> at the very end which looks like this: > >>>>>> > >>>>>> Warning message: > >>>>>> In .Internal(quit(save, status, runLast)) : > >>>>>> system call failed: Cannot allocate memory > >>>>>> > >>>>>> Is there any way to avoid this kind of warnings? I am using stderr > >>>>>> output for detecting failures in scripts and this warning is a false > >>>>>> positive of a failure. > >>>>>> > >>>>>> Maybe quit function could wait little bit longer trying to allocate > >>>>>> before it raises this warning? > >>>>> If you see this warning, some call to system() or system2() or similar, > >>>>> which executes an external program, failed to even run a shell to run > >>>>> that external program, because there was not enough memory. You should > >>>>> be able to find out where it happens by checking the exit status of > >>>>> system(). > >>>>> > >>>>> Tomas > >>>>> > >>>>> > >>>>>> Best regards, > >>>>>> Jan Gorecki > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-devel@r-project.org mailing list > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>>> ______________________________________________ > >>>>> R-devel@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > >> > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel