Hello, All:

I have a 4.54 GB file that I'm trying to read in chunks using "scan(..., skip=__)". It works as expected for small values of "skip" but goes into an infinite loop for "skip=1e11" and similar large values of skip: I cannot even interrupt it; I must kill R. Below please find sessionInfo() with a toy example.


My real problem is a large corrupted Thunderbird email file. It's file type "Mork", which is mostly standard characters with "\n" between records of varying length.


Is there some other function in R that allows me to read chunks of a large file like this?


          Thanks,
          Spencer Graves


writeLines(as.character(1:11), 'tstNums.txt')
(Tst2 <- scan('tstNums.txt', n=12, skip=5))
# works: 6 7 8 9 10 11
(Tst13 <- scan('tstNums.txt', n=12, skip=13))
# works: numeric(0)
(tst1e11 <- scan('tst.txt', n=12, skip=1e11))
# Goes into an infinite loop that I cannot even interrupt.
# I must kill R and start over.


sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.7.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] compiler_4.2.2  fastmap_1.1.0   cli_3.6.0       htmltools_0.5.4
 [5] tools_4.2.2     rstudioapi_0.14 yaml_2.3.6      rmarkdown_2.20
 [9] knitr_1.41      xfun_0.36       digest_0.6.31   rlang_1.0.6
[13] evaluate_0.20

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to