Thank you for your suggestion. Unfortunately, while R doesn't segfault calling readr::read_file() on the test file I described, I get the error message:
Error in read_file_(ds, locale) : negative length vectors are not allowed Jen On Sat, Sep 2, 2017 at 1:38 PM, Ista Zahn <istaz...@gmail.com> wrote: > As s work-around I suggest readr::read_file. > > --Ista > > > On Sep 2, 2017 2:58 PM, "Jennifer Lyon" <jennifer.s.l...@gmail.com> wrote: > >> Hi: >> >> I have a 2.1GB JSON file. Typically I use readLines() and >> jsonlite:fromJSON() to extract data from a JSON file. >> >> When I try and read in this file using readLines() R segfaults. >> >> I believe the two salient issues with this file are >> 1). Its size >> 2). It is a single line (no line breaks) >> >> I can reproduce this issue as follows >> #Generate a big file with no line breaks >> # In R >> > writeLines(paste0(c(letters, 0:9), collapse=""), "alpha.txt", sep="") >> >> # in unix shell >> cp alpha.txt file.txt >> for i in {1..26}; do cat file.txt file.txt > file2.txt && mv -f file2.txt >> file.txt; done >> >> This generates a 2.3GB file with no line breaks >> >> in R: >> > moo <- readLines("file.txt") >> >> *** caught segfault *** >> address 0x7cffffff, cause 'memory not mapped' >> >> Traceback: >> 1: readLines("file.txt") >> >> Possible actions: >> 1: abort (with core dump, if enabled) >> 2: normal R exit >> 3: exit R without saving workspace >> 4: exit R saving workspace >> Selection: 3 >> >> I conclude: >> I am potentially running up against a limit in R, which should give a >> reasonable error, but currently just segfaults. >> >> My question: >> Most of the content of the JSON is an approximately 100K x 6K JSON >> equivalent of a dataframe, and I know R can handle much bigger than this >> size. I am expecting these JSON files to get even larger. My R code lives >> in a bigger system, and the JSON comes in via stdin, so I have absolutely >> no control over the data format. I can imagine trying to incrementally >> parse the JSON so I don't bump up against the limit, but I am eager for >> suggestions of simpler solutions. >> >> Also, I apologize for the timing of this bug report, as I know folks are >> working to get out the next release of R, but like so many things I have >> no >> control over when bugs leap up. >> >> Thanks. >> >> Jen >> >> > sessionInfo() >> R version 3.4.1 (2017-06-30) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 14.04.5 LTS >> >> Matrix products: default >> BLAS: R-3.4.1/lib/libRblas.so >> LAPACK:R-3.4.1/lib/libRlapack.so >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] compiler_3.4.1 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel