New issue 2071: file.readinto() uses too much memory https://bitbucket.org/pypy/pypy/issue/2071/filereadinto-uses-too-much-memory
Andrew Dalke: I am using CFFI to read a file containing 7 GB of uint64_t data. I use ffi.new() to allocate the space, then readinto() the pre-allocated buffer, as suggested by the CFFI documentation. (Note: the docstring for readinto says "Undocumented. Don't use this; it may go away".) It appears that something internal to readinto makes a copy of the input because the readinto() ends up running out of memory on my 16 GB box, which has 15 GB free. I am able to reproduce the problem using the array module, so it is not some oddity of the CFFI implementation. Here is an example of what causes a problem on my machine: ``` #!python >>>> import array >>>> a=array.array("c", s) >>>> a.extend(s) >>>> a.extend(s) # do some cleanup, to be on the safe side. >>>> del s >>>> import gc >>>> gc.collect() 0 # Read ~6GB from a file with >7GB in it >>>> len(a) 6442450944 >>>> filename = "pubchem.14" >>>> import os >>>> os.path.getsize(filename) 7662345264 >>>> infile = open(filename, "rb") # Currently, virtual memory size = 8.87 GB >>>> infile.readinto(a) ^CTerminated # I killed it when the virtual memory was at 14 GB and still growing ``` _______________________________________________ pypy-issue mailing list pypy-issue@python.org https://mail.python.org/mailman/listinfo/pypy-issue