Charles-François Natali added the comment:

> @Charles-François: I think your worries about calloc and overcommit are 
> unjustified. First, calloc and malloc+memset actually behave the same way 
> here -- with a large allocation and overcommit enabled, malloc and calloc 
> will both go ahead and return the large allocation, and then the actual 
> out-of-memory (OOM) event won't occur until the memory is accessed. In the 
> malloc+memset case this access will occur immediately after the malloc, 
> during the memset -- but this is still too late for us to detect the malloc 
> failure.

Not really: what you describe only holds for a single object.
But if you allocate let's say 1000 such objects at once:
- in the malloc + memset case, the committed pages are progressively
accessed (i.e. the pages for object N are accessed before the memory
is allocated for object N+1), so they will be counted not only as
committed, but also as active (for example the RSS will increase
gradually): so at some point, even though by default the Linux VM
subsystem is really lenient toward overcommitting, you'll likely have
malloc/mmap return NULL because of this
- in the calloc() case, all the memory is first committed, but not
touched: the kernel will likely happily overcommit all of this. Only
when you start progressively accessing the pages will the OOM kick in.

> Second, OOM does not cause segfaults on any system I know. On Linux it wakes 
> up the OOM killer, which shoots some random (possibly guilty) process in the 
> head. The actual program which triggered the OOM is quite likely to escape 
> unscathed.

Ah, did I say segfault?
Sorry, I of course meant that the process will get nuked by the OOM killer.

> In practice, the *only* cases where you can get a MemoryError on modern 
> systems are (a) if the user has turned overcommit off, (b) you're on a tiny 
> embedded system that doesn't have overcommit, (c) if you run out of virtual 
> address space. None of these cases are affected by the differences between 
> malloc and calloc.

That's a common misconception: provided that the memory allocated is
accessed progressively (see above point), you'll often get ENOMEM,
even with overcommitting:

$ /sbin/sysctl -a | grep overcommit
vm.nr_overcommit_hugepages = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

$ cat /tmp/test.py
l = []

with open('/proc/self/status') as f:
    try:
        for i in range(50000000):
            l.append(i)
    except MemoryError:
        for line in f:
            if 'VmPeak' in line:
                print(line)
        raise

$ python /tmp/test.py
VmPeak:   720460 kB

Traceback (most recent call last):
  File "/tmp/test.py", line 7, in <module>
    l.append(i)
MemoryError

I have a 32-bit machine, but the process definitely has more than
720MB of address space ;-)

If your statement were true, this would mean that it's almost
impossible to get ENOMEM with overcommitting on a 64-bit machine,
which is - fortunately - not true. Just try python -c "[i for i in
range(<large value>)]" on a 64-bit machine, I'll bet you'll get a
MemoryError (ENOMEM).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21233>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to