>>>>> Henrik Bengtsson >>>>> on Thu, 4 May 2023 18:38:14 +0200 writes:
> On Thu, May 4, 2023 at 3:02 PM Serguei Sokol via R-devel > <r-devel@r-project.org> wrote: >> >> Le 03/05/2023 à 01:25, Henrik Bengtsson a écrit : > Along >> the lines of calling R_CheckUserInterrupt() only onces in >> a while: >> > >> >> OTOH, in the past we have had to *disable* >> R_CheckUserInterrupt() >> in parts of R's code because it >> was too expensive, >> {see current >> src/main/{seq.c,unique.c} for a series of commented-out >> >> R_CheckUserInterrupt() for such speed-loss reasons} > >> First, here are links to these two files viewable online: >> > >> > * >> https://github.com/wch/r-source/blob/trunk/src/main/seq.c >> > >> > * >> https://github.com/wch/r-source/blob/trunk/src/main/unique.c >> > >> > When not commented out, R_CheckUserInterrupt() would >> have been called > every 1,000,000 times per: >> > >> > /* interval at which to check interrupts */ > #define >> NINTERRUPT 1000000 >> > >> > and >> > >> > if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt() >> > >> > in each iteration. That '(i+1) % NINTERRUPT == 0' >> expression can be > quite expensive too. I vaguely >> remember a hack that was mentioned on this list as close >> to 0-cost. It looked something like: >> >> iint = NINTERRUPT; for (...) { if (--iint == 0) { >> R_CheckUserInterrupt(); iint = NINTERRUPT; } } >> >> Best, Serguei. >> > Alternatively, one can increment a counter and reset to >> zero after > calling R_CheckUserInterrupt(); I think >> that's equally performant. > Yes, that's the one, e.g. Tomas K migrated some "modulo" > ones in R-devel to this one yesterday > (https://github.com/wch/r-source/commit/1ca6c6c6246629c6a98a526a2906595e5cfcd45e). > /Henrik Indeed. I have now (svn rev 84399) committed a version Ivan's patch which uses Tomas' approach (I think) in all cases, making uniformly use of this macro (derived from a version of what Tomas used): #define IF_IC_R_CheckUserInterrupt() \ if(!(--ic)) { \ R_CheckUserInterrupt(); \ ic = 9999; \ } I confirm that I now can interrupt Ivan's example. The interrupt is not handled immediately but within a second or so. Martin >> > However, if we change the code to use NINTERRUPT > = >> 2^k where k = {1, 2, ...}, say >> > >> > #define NINTERRUPT 1048576 >> > >> > the compiler would optimize the condition to use "the >> modulo of powers > of 2 can alternatively be expressed as >> a bitwise AND operation" > (Thomas Lumley, 2015-06-15). >> The speedup is quite impressive, cf. > >> <https://www.jottr.org/2015/06/05/checkuserinterrupt/>. >> > Alternatively, one can increment a counter and reset to >> zero after > calling R_CheckUserInterrupt(); I think >> that's equally performant. >> > >> > Regarding making serialize() / unserialize() >> interruptible: I think > can be a good idea since we work >> with larger objects these days. > However, if we >> implement this, we probably have to consider what > >> happens when an interrupt happens. For example, transfers >> between a > client and a server are no longer atomic at >> this level, which means we > might end up in a corrupt >> state. This may, for instance, happen to > database >> transactions, and PSOCK parallel worker communication. A >> > quick fix would be to use base::suspendInterrupts(), >> but better would > of course be to handle interrupts >> gracefully. >> > >> > My $.02 + $0.02 >> > >> > /Henrik >> > >> > On Tue, May 2, 2023 at 3:56 PM Jeroen Ooms >> <jeroeno...@gmail.com> wrote: >> On Tue, May 2, 2023 at >> 3:29 PM Martin Maechler >> <maech...@stat.math.ethz.ch> >> wrote: >>>>>>>> Ivan Krylov >>>>>>>> on Tue, 2 May 2023 >> 14:59:36 +0300 writes: >>> > В Sat, 29 Apr 2023 00:00:02 >> +0000 >>> > Dario Strbenac via R-devel >> <r-devel@r-project.org> пишет: >> >>> >> >>> >> Could save.image() be redesigned so that it >> promptly responds to >>> >> Ctrl+C? It prevents the >> command line from being used for a number of >>> >> hours >> if the contents of the workspace are large. >> >>> >> >>> > This is ultimately caused by serialize() being >> non-interruptible. A >>> > relatively simple way to hang >> an R session for a long-ish time would >>> > therefore >> be: >> >>> >> >>> > f <- xzfile(nullfile(), 'a+b') >>> > x <- rep(0, >> 1e9) # approx. 8 gigabytes, adjust for your RAM size >>> >> > serialize(x, f) >>> > close(f) >> >>> >> >>> > This means that calling R_CheckUserInterrupt() >> between saving >>> > individual objects is not enough: R >> also needs to check for interrupts >>> > while saving >> sufficiently long vectors. >> >>> >> >>> > Since the serialize() infrastructure is carefully >> written to avoid >>> > resource leaks on allocation >> failures, it looks relatively safe to >>> > liberally >> sprinkle R_CheckUserInterrupt() where it makes sense to >> do >>> > so, i.e. once per WriteItem() (which calls >> itself recursively and >>> > non-recursively) and once >> per every downstream for loop iteration. >>> > Valgrind >> doesn't show any new leaks if I apply the patch, >> interrupt >>> > serialize() and then exit. R also passes >> make check after the applied >>> > patch. >> >>> >> >>> > Do these changes make sense, or am I overlooking >> some other problem? >> >>> >> >>> Thank you, Ivan! >> >>> >> >>> They do make sense... but : >> >>> >> >>> OTOH, in the past we have had to *disable* >> R_CheckUserInterrupt() >>> in parts of R's code because >> it was too expensive, >>> {see current >> src/main/{seq.c,unique.c} for a series of commented-out >> >>> R_CheckUserInterrupt() for such speed-loss reasons} >> >>> >> >>> so adding these may need a lot of care when we >> simultaneously >>> want to remain efficient for "morally >> valid" use of serialization... >>> where we really don't >> want to pay too much of a premium. >> Alternatively, one >> could consider making R throttle or debounce calls >> to >> R_CheckUserInterrupt such that a repeated calls within x >> time are >> ignored, cf: >> https://www.freecodecamp.org/news/javascript-debounce-example/ >> >> >> >> The reasoning being that it may be difficult for >> (contributed) code to >> determine when/where it is >> appropriate to check for interrupts, given >> varying >> code paths and cpu speed. Maybe it makes more sense to >> call >> R_CheckUserInterrupt frequently wherever it is >> safe to do so, and let >> R decide if reasonable time has >> elapsed to actually run the (possibly >> expensive) ui >> check again. >> >> >> >> Basic example: >> https://github.com/r-devel/r-svn/pull/125/files >> >> >> >> >> >> >> >> >> >>> {{ saving the whole user workspace is not "valid" in >> that sense >>> in my view. I tell all my (non-beginner) >> Rstudio-using >>> students they should turn *off* the >> automatic saving and >>> loading at session end / >> beginning; and for reproducibility >>> only saveRDS() [or >> save()] *explicitly* a few precious >>> objects .. >>> >> }} >> >>> >> >>> Again, we don't want to punish people who know what >> they are >>> doing, just because other R users manage to >> hang their R session >>> by too little thinking ... >> >>> >> >>> Your patch adds 15 such interrupt checking calls >> which may >>> really be too much -- I'm not claiming they >> are: with our >>> recursive objects it's surely not very >> easy to determine the >>> "minimally necessary" such >> calls. >> >>> >> >>> In addition, we may still consider adding an extra >> optional >>> argument, say `check.interrupt = TRUE` >>> >> which we may default to TRUE when save.image() is called >> >>> but e.g., not when serialize() is called.. >> >>> >> >>> Martin >> >>> >> >>> > -- >> >>> > Best regards, >>> > Ivan >>> > x[DELETED ATTACHMENT >> external: Rd_IvanKrylov_interrupt-serialize.patch, >> text/x-patch] >>> > >> ______________________________________________ >>> > >> R-devel@r-project.org mailing list >>> > >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >>> >> >>> ______________________________________________ >>> >> R-devel@r-project.org mailing list >>> >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> >> R-devel@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> -- >> Serguei Sokol Ingenieur de recherche INRAE >> >> Cellule Mathématiques TBI, INSA/INRAE UMR 792, INSA/CNRS >> UMR 5504 135 Avenue de Rangueil 31077 Toulouse Cedex 04 >> >> tel: +33 5 61 55 98 49 email: so...@insa-toulouse.fr >> http://www.toulouse-biotechnology-institute.fr/en/technology_platforms/mathematics-cell.html >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel