As s work-around I suggest readr::read_file. --Ista
On Sep 2, 2017 2:58 PM, "Jennifer Lyon" <jennifer.s.l...@gmail.com> wrote: > Hi: > > I have a 2.1GB JSON file. Typically I use readLines() and > jsonlite:fromJSON() to extract data from a JSON file. > > When I try and read in this file using readLines() R segfaults. > > I believe the two salient issues with this file are > 1). Its size > 2). It is a single line (no line breaks) > > I can reproduce this issue as follows > #Generate a big file with no line breaks > # In R > > writeLines(paste0(c(letters, 0:9), collapse=""), "alpha.txt", sep="") > > # in unix shell > cp alpha.txt file.txt > for i in {1..26}; do cat file.txt file.txt > file2.txt && mv -f file2.txt > file.txt; done > > This generates a 2.3GB file with no line breaks > > in R: > > moo <- readLines("file.txt") > > *** caught segfault *** > address 0x7cffffff, cause 'memory not mapped' > > Traceback: > 1: readLines("file.txt") > > Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace > Selection: 3 > > I conclude: > I am potentially running up against a limit in R, which should give a > reasonable error, but currently just segfaults. > > My question: > Most of the content of the JSON is an approximately 100K x 6K JSON > equivalent of a dataframe, and I know R can handle much bigger than this > size. I am expecting these JSON files to get even larger. My R code lives > in a bigger system, and the JSON comes in via stdin, so I have absolutely > no control over the data format. I can imagine trying to incrementally > parse the JSON so I don't bump up against the limit, but I am eager for > suggestions of simpler solutions. > > Also, I apologize for the timing of this bug report, as I know folks are > working to get out the next release of R, but like so many things I have no > control over when bugs leap up. > > Thanks. > > Jen > > > sessionInfo() > R version 3.4.1 (2017-06-30) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 14.04.5 LTS > > Matrix products: default > BLAS: R-3.4.1/lib/libRblas.so > LAPACK:R-3.4.1/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.4.1 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel