With
if (!j--) { R_CheckUserInterrupt(); j = 10000; } as in current R devel (r83976), j goes negative (-1) and interrupt is checked every 10001 instead of 10000. I prefer if (!--j) { R_CheckUserInterrupt(); j = 10000; } . In current R devel (r83976), if EOF is reached, the outer loop keeps going, i keeps incrementing until nskip. The outer loop could be made to also stop on EOF. Alternatively, not using nested loop is possible, like the following. if (nskip) for (R_xlen_t i = 0, j = 10000; ; ) { /* MBCS-safe */ c = scanchar(FALSE, &data); if (!j--) { R_CheckUserInterrupt(); j = 10000; } if ((c == '\n' && ++i == nskip) || c == R_EOF) break; } ----------- On 2/11/23 09:33, Ivan Krylov wrote: > On Fri, 10 Feb 2023 23:38:55 -0600 > Spencer Graves <spencer.graves using prodsyse.com> wrote: > >> I have a 4.54 GB file that I'm trying to read in chunks using >> "scan(..., skip=__)". It works as expected for small values of >> "skip" but goes into an infinite loop for "skip=1e11" and similar >> large values of skip: I cannot even interrupt it; I must kill R. > Skipping lines is done by two nested loops. The outer loop counts the > lines to skip; the inner loop reads characters until it encounters a > newline or end of file. The outer loop doesn't check for EOF and keeps > asking for more characters until the inner loop runs at least once for > every line it wants to skip. The following patch should avoid the > wait in such cases: > > --- src/main/scan.c (revision 83797) > +++ src/main/scan.c (working copy) > @@ -835,7 +835,7 @@ > attribute_hidden SEXP do_scan(SEXP call, SEXP op, SEXP args, SEXP rho) > { > SEXP ans, file, sep, what, stripwhite, dec, quotes, comstr; > - int c, flush, fill, blskip, multiline, escapes, skipNul; > + int c = 0, flush, fill, blskip, multiline, escapes, skipNul; > R_xlen_t nmax, nlines, nskip; > const char *p, *encoding; > RCNTXT cntxt; > @@ -952,7 +952,7 @@ > if(!data.con->canread) > error(_("cannot read from this connection")); > } > - for (R_xlen_t i = 0; i < nskip; i++) /* MBCS-safe */ > + for (R_xlen_t i = 0; i < nskip && c != R_EOF; i++) /* MBCS-safe */ > while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF); > } > > > Making it interruptible is a bit more work: we need to ensure that a > valid context is set up and check regularly for an interrupt. > > --- src/main/scan.c (revision 83797) > +++ src/main/scan.c (working copy) > @@ -835,7 +835,7 @@ > attribute_hidden SEXP do_scan(SEXP call, SEXP op, SEXP args, SEXP rho) > { > SEXP ans, file, sep, what, stripwhite, dec, quotes, comstr; > - int c, flush, fill, blskip, multiline, escapes, skipNul; > + int c = 0, flush, fill, blskip, multiline, escapes, skipNul; > R_xlen_t nmax, nlines, nskip; > const char *p, *encoding; > RCNTXT cntxt; > @@ -952,8 +952,6 @@ > if(!data.con->canread) > error(_("cannot read from this connection")); > } > - for (R_xlen_t i = 0; i < nskip; i++) /* MBCS-safe */ > - while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF); > } > > ans = R_NilValue; /* -Wall */ > @@ -966,6 +964,10 @@ > cntxt.cend = &scan_cleanup; > cntxt.cenddata = &data; > > + if (ii) for (R_xlen_t i = 0, j = 0; i < nskip && c != R_EOF; i++) /* >MBCS-safe */ > + while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF) > + if (j++ % 10000 == 9999) R_CheckUserInterrupt(); > + > switch (TYPEOF(what)) { > case LGLSXP: > case INTSXP: > > This way, even if you pour a Decanter of Endless Lines (e.g. mkfifo > LINES; perl -E'print "A"x42 while 1;' > LINES) into scan(), it can > still be interrupted, even if neither newline nor EOF ever arrives. Thanks, I've updated the implementation of scan() in R-devel to be interruptible while skipping lines. I've done it slightly differently as I found there already was a memory leak, which could be fixed by creating the context a bit earlier. I've also avoided modulo on the fast path as I saw 13% performance overhead on my mailbox file. Decrementing and checking against zero didn't have measurable overhead. Best Tomas [snip] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel