New issue 2071: file.readinto() uses too much memory
https://bitbucket.org/pypy/pypy/issue/2071/filereadinto-uses-too-much-memory

Andrew Dalke:

I am using CFFI to read a file containing 7 GB of uint64_t data. I use 
ffi.new() to allocate the space, then readinto() the pre-allocated buffer, as 
suggested by the CFFI documentation. 

(Note: the docstring for readinto says "Undocumented. Don't use this; it may go 
away".)

It appears that something internal to readinto makes a copy of the input 
because the readinto() ends up running out of memory on my 16 GB box, which has 
15 GB free.

I am able to reproduce the problem using the array module, so it is not some 
oddity of the CFFI implementation. Here is an example of what causes a problem 
on my machine:

```
#!python

>>>> import array
>>>> a=array.array("c", s)
>>>> a.extend(s)
>>>> a.extend(s)

# do some cleanup, to be on the safe side.
>>>> del s
>>>> import gc
>>>> gc.collect()
0

# Read ~6GB from a file with >7GB in it
>>>> len(a)
6442450944
>>>> filename = "pubchem.14"
>>>> import os
>>>> os.path.getsize(filename)
7662345264
>>>> infile = open(filename, "rb")

# Currently, virtual memory size = 8.87 GB
>>>> infile.readinto(a)
^CTerminated

# I killed it when the virtual memory was at 14 GB and still growing

```



_______________________________________________
pypy-issue mailing list
pypy-issue@python.org
https://mail.python.org/mailman/listinfo/pypy-issue

Reply via email to