[issue26601] Use new madvise()'s MADV_FREE on the private heap

2017-08-23 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
stage:  -> needs patch
versions: +Python 3.7 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Antoine Pitrou

Antoine Pitrou added the comment:

All this discussion is in the context of the GNU libc allocator, but please 
remember that Python works on many platforms, including OS X, Windows, the 
*BSDs...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Julian Taylor

Julian Taylor added the comment:

which is exactly what malloc is already doing for, thus my point is by using 
malloc we would fullfill your request.

But do you have an actual real work application where this would help?
it is pretty easy to figure out, just run the application under perf and see if 
there is a relevant amount of time spent in page_fault/clear_pages.

And as mentioned you can already change the allocator for arenas at runtime, so 
you could also try changing it to malloc and see if your application gets any 
faster.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Antti Haapala

Antti Haapala added the comment:

mmap is not the problem, the eagerness of munmap is a source of possible 
problem. 

The munmap eagerness does not show problems in all programs because the arena 
allocation heuristics do not work as intended. A proper solution in Linux and 
other operating systems where it is supported, is to put the freed arenas in a 
list, then mark freed with MADV_FREE. Now if the memory pressure grows, only 
*then* will the OS reclaim these. At any time the application can start reusing 
these arenas/pages; if they're not reclaimed, the old contents will be still 
present there; if operating system reclaimed them, they'd be remapped with 
zeroes.

Really the only downside of all this that I can foresee is that 
`ps/top/whatever` output would see Python using way more memory in its 
RSS/virt/whatever than it is actually using.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Julian Taylor

Julian Taylor added the comment:

I know one can change the allocator, but the default is mmap which I don't 
think is a very good choice for the current arena size.
All the arguments about fragmentation and memory space also apply to pythons 
arena allocator itself and I am not convinced that fragmentation of the libc 
allocator is a real problem for python as pythons allocation pattern is very 
well behaved _due_ to its own arena allocator. I don't doubt it but I think it 
would be very valuable to document the actual real world use case that 
triggered this change, just to avoid people stumbling over this again and again.

But then I also don't think that anything needs to be necessarily be changed 
either, I have not seen the mmaps being a problem in any profiles of 
applications I work with.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread STINNER Victor

STINNER Victor added the comment:

I'm not sure that I understood correctly, but if you are proposing to use 
malloc()/free() instead of mmap()/munmap() to allocate arenas in pymalloc, you 
have to know that we already different allocators depending on the platform:
https://docs.python.org/dev/c-api/memory.html#the-pymalloc-allocator

By the way, it is possible to modify the arena allocator at runtime:
https://docs.python.org/dev/c-api/memory.html#customize-pymalloc-arena-allocator

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Charles-François Natali

Charles-François Natali added the comment:

The heap on Linux is still a linear contiguous *address space*. I
agree that MADV_DONTNEED allow's returning committed memory back to
the VM subsystem, but it is still using a large virtual memory area.
Not everyone runs on 64-bit, or can waste address space.
Also, not every Unix is Linux.

But it might make sense to use malloc on Linux, maybe only on 64-bit.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Julian Taylor

Julian Taylor added the comment:

glibcs malloc is not obstack, its not a simple linear heap where one object on 
top means everything below is not freeable. It also uses MADV_DONTNEED give 
sbrk'd memory back to the system. This is the place where MADV_FREE can now be 
used now as the latter does not guarantee a page fault.
But that said of course you can construct workloads which lead to increased 
memory usage also with malloc and maybe python triggers them more often than 
other applications. Is there an existing issues showing the problem? It would 
be a good form of documentation in the source.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Charles-François Natali

Charles-François Natali added the comment:

> Julian Taylor added the comment:
>
> it defaulted to 128kb ten years ago, its a dynamic threshold since ages.

Indeed, and that's what encouraged switching the allocator to use mmap.
The problem with dynamic mmap threshold is that since the Python
allocator uses fixed-size arenas, basically malloc always ends up
allocating from the heap (brk).
Which means that given that we don't use a - compacting - garbage
collector, after a while the heap would end up quite fragmented, or
never shrink: for example let's say you allocate 1GB - on the heap -
and then you free them, but  a single object is allocated at the top
of the heap, you heap never shrinks back.
This has bitten people (and myself a couple times at work).

Now, I see several options:
- revert to using malloc, but this will re-introduce the original problem
- build some form of hysteresis in the arena allocation
- somewhat orthogonally, I'd be interested to see if we couldn't
increase the arena size

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread Julian Taylor

Julian Taylor added the comment:

it defaulted to 128kb ten years ago, its a dynamic threshold since ages.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread Julian Taylor

Julian Taylor added the comment:

ARENA_SIZE is 256kb, the threshold in glibc is up to 32 MB

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread David Wilson

David Wilson added the comment:

@Julian note that ARENA_SIZE is double the threshold after which at least glibc 
resorts to calling mmap directly, so using malloc in place of mmap on at least 
Linux would have zero effect

--
nosy: +dw

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread Julian Taylor

Julian Taylor added the comment:

simplest way to fix this would be to not use malloc instead of mmap in the 
allocator, then you also get MADV_FREE for free when malloc uses it.
The rational for using mmap is kind of weak, the source just says "heap 
fragmentation". The usual argument for using mmap is not that but the instant 
return of memory to the system, quite the opposite of what the python memory 
pool does.

--
nosy: +jtaylor

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread Bar Harel

Bar Harel added the comment:

Any idea how to test it then? I found this happening by chance because I care 
about efficiency too much. We can't just stick timeit in random areas and hope 
to get results.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Another question is how often this situation occurs in practice and whether 
it's worth spending some bits, CPU cycles and developer time on "fixing" this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread Antti Haapala

Antti Haapala added the comment:

Also what is important to notice is that the behaviour occurs *exactly* because 
the current heuristics *work*; the allocations were successfully organized so 
that one arena could be freed as soon as possible. The question is that is it 
sane to try to free the few bits of free memory asap - say you're now holding 
100M of memory - it does not often matter much if you hold the 100M of memory 
for *one second longer* than you actually ended up needing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread Antti Haapala

Antti Haapala added the comment:

I said that *munmapping* is not the smart thing to do: and it is not, if you're 
going to *mmap* soon again.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> ... and it turns out that munmapping is not always that smart thing to do:

I don't think a silly benchmark says anything about the efficiency of our 
allocation strategy. If you have a real-world use case where this turns up, 
then please post about it.

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread Bar Harel

Changes by Bar Harel :


--
nosy: +bar.harel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread Antti Haapala

Antti Haapala added the comment:

> Maybe we need an heuristic to release the free arena after N calls to object 
> allocator functions which don't need this free arena.

That'd be my thought; again I believe that `madvise` could be useful there; now 
`mmap`/`munmap` I believe is particularly slow because it actually needs to 
supply 256kbytes of *zeroed* pages.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-11 Thread STINNER Victor

STINNER Victor added the comment:

> ... and it turns out that munmapping is not always that smart thing to do: 
> http://stackoverflow.com/questions/36548518/variable-assignment-faster-than-one-liner

py -3 -m timeit "tuple(range(2000)) == tuple(range(2000))"
1 loops, best of 3: 97.7 usec per loop
py -3 -m timeit "a = tuple(range(2000));  b = tuple(range(2000)); a==b"
1 loops, best of 3: 70.7 usec per loop

Hum, it looks like this specific benchmark spends a lot of time to allocate one 
arena and then release it.

Maybe we should keep one "free" arena to avoid the slow mmap/munmap. But it 
means that we keep 256 KB of unused memory.

Maybe we need an heuristic to release the free arena after N calls to object 
allocator functions which don't need this free arena.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-03-21 Thread STINNER Victor

STINNER Victor added the comment:

Are you aware of unused memory in the heap memory?

The pymalloc memory allocator uses munmap() to release a wgole arena as
soon as the last memory block of an arena is freed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-03-21 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +neologix

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-03-21 Thread SilentGhost

Changes by SilentGhost :


--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-03-21 Thread Marcos Dione

New submission from Marcos Dione:

Linux kernel's new madvise() MADV_FREE[1] could be used in the memory allocator 
to signal unused parts of the private heap as such, allowing the kernel use 
those pages for resolving lowmem pressure situations. From a LWN article[2]:

[...] Rather than reclaiming the pages immediately, this operation marks them 
for "lazy freeing" at some future point. Should the kernel run low on memory, 
these pages will be among the first reclaimed for other uses; should the 
application try to use such a page after it has been reclaimed, the kernel will 
give it a new, zero-filled page. But if memory is not tight, pages marked with 
MADV_FREE will remain in place; a future access to those pages will clear the 
"lazy free" bit and use the memory that was there before the MADV_FREE call. 

[...] MADV_FREE appears to be aimed at user-space memory allocator 
implementations. When an application frees a set of pages, the allocator will 
use an MADV_FREE call to tell the kernel that the contents of those pages no 
longer matter. Should the application quickly allocate more memory in the same 
address range, it will use the same pages, thus avoiding much of the overhead 
of freeing the old pages and allocating and zeroing the new ones. In short, 
MADV_FREE is meant as a way to say "I don't care about the data in this address 
range, but I may reuse the address range itself in the near future." 

Also note that this feature already exists in BSD kernels.

--
[1] 
http://kernelnewbies.org/Linux_4.5#head-42578a3e087d5bcc2940954a38ce794fe2cd642c

[2] https://lwn.net/Articles/590991/

--
components: Interpreter Core
messages: 262117
nosy: StyXman
priority: normal
severity: normal
status: open
title: Use new madvise()'s MADV_FREE on the private heap
type: enhancement
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com