[Numpy-discussion] caching large allocations on gnu/linux
Hi, As numpy often allocates large arrays and one factor in its performance is faulting memory from the kernel to the process. This has some cost that is relatively significant. For example in this operation on large arrays it accounts for 10-15% of the runtime: import numpy as np a = np.ones(1000) b = np.ones(1000) %timeit (a * b)**2 + 3 54.45% ipython umath.so [.] sse2_binary_multiply_DOUBLE 20.43% ipython umath.so [.] DOUBLE_add 16.66% ipython [kernel.kallsyms][k] clear_page The reason for this is that the glibc memory allocator uses memory mapping for large allocations instead of reusing already faulted memory. The reason for this is to return memory back to the system immediately when it is free to keep the whole system more robust. This makes a lot of sense in general but not so much for many numerical applications that often are the only thing running. But despite if have been shown in an old paper that caching memory in numpy speeds up many applications, numpys usage is diverse so we couldn't really diverge from the glibc behaviour. Until Linux 4.5 added support for madvise(MADV_FREE). This flag of the madvise syscall tells the kernel that a piece of memory can be reused by other processes if there is memory pressure. Should another process claim the memory and the original process want to use it again the kernel will fault new memory into its place so it behaves exactly as if it was just freed regularly. But when no other process claims the memory and the original process wants to reuse it, the memory do not need to be faulted again. So effectively this flag allows us to cache memory inside numpy that can be reused by the rest of the system if required. Doing gives the expected speedup in the above example. An issue is that the memory usage of numpy applications will seem to increase. The memory that is actually free will still show up in the usual places you look at memory usage. Namely the resident memory usage of the process in top, /proc etc. The usage will only go down when the memory is actually needed by other processes. This probably would break some of the memory profiling tools so probably we need a switch to disable the caching for the profiling tools to use. Another concern is that using this functionality is actually the job of the system memory allocator but I had a look at glibcs allocator and it does not look like an easy job to make good use of MADV_FREE retroactively, so I don't expect this to happen anytime soon. Should it be agreed that caching is worthwhile I would propose a very simple implementation. We only really need to cache a small handful of array data pointers for the fast allocate deallocate cycle that appear in common numpy usage. For example a small list of maybe 4 pointers storing the 4 largest recent deallocations. New allocations just pick the first memory block of sufficient size. The cache would only be active on systems that support MADV_FREE (which is linux 4.5 and probably BSD too). So what do you think of this idea? cheers, Julian signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float16/32: wrong number of digits?
> `float(repr(a)) == a` is guaranteed for Python `float` And `np.float16(repr(a)) == a` is guaranteed for `np.float16`(and the same is true up to `float128`, which can be platform-dependent). Your code doesn't work because you're deserializing to a higher precision format than you serialized to. -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/float16-32-wrong-number-of-digits-tp44037p44046.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float16/32: wrong number of digits?
On Mon, Mar 13, 2017 at 12:57 PM Eric Wieser wrote: > > `float(repr(a)) == a` is guaranteed for Python `float` > > And `np.float16(repr(a)) == a` is guaranteed for `np.float16`(and the same > is true up to `float128`, which can be platform-dependent). Your code > doesn't work because you're deserializing to a higher precision format than > you serialized to. > I would hesitate to make this guarantee - certainly for old versions of numpy, np.float128(repr(x))!=x in many cases. I submitted a patch, now accepted, that probably accomplishes this on most systems (in fact this is now in the test suite) but if you are using a version of numpy that is a couple of years old, there is no way to convert long doubles to human-readable or back that doesn't lose precision. To repeat: only in recent versions of numpy can long doubles be converted to human-readable and back without passing through doubles. It is still not possible to use % or format() on them without discarding all precision beyond doubles. If you actually need long doubles (and if you don't, why use them?) make sure your application includes a test for this ability. I recommend checking repr(1+np.finfo(np.longdouble).eps). Anne P.S. You can write (I have) a short piece of cython code that will reliably repr and back long doubles, but on old versions of numpy it's just not possible from within python. -A ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] caching large allocations on gnu/linux
On Mon, Mar 13, 2017 at 12:21 PM Julian Taylor < jtaylor.deb...@googlemail.com> wrote: Should it be agreed that caching is worthwhile I would propose a very > simple implementation. We only really need to cache a small handful of > array data pointers for the fast allocate deallocate cycle that appear > in common numpy usage. > For example a small list of maybe 4 pointers storing the 4 largest > recent deallocations. New allocations just pick the first memory block > of sufficient size. > The cache would only be active on systems that support MADV_FREE (which > is linux 4.5 and probably BSD too). > > So what do you think of this idea? > This is an interesting thought, and potentially a nontrivial speedup with zero user effort. But coming up with an appropriate caching policy is going to be tricky. The thing is, for each array, numpy grabs a block "the right size", and that size can easily vary by orders of magnitude, even within the temporaries of a single expression as a result of broadcasting. So simply giving each new array the smallest cached block that will fit could easily result in small arrays in giant allocated blocks, wasting non-reclaimable memory. So really you want to recycle blocks of the same size, or nearly, which argues for a fairly large cache, with smart indexing of some kind. How much difference is this likely to make? Note that numpy is now in some cases able to eliminate allocation of temporary arrays. I think the only way to answer these questions is to set up a trial implementation, with user-switchable behaviour (which should include the ability for users to switch it on even when MADV_FREE is not available) and sensible statistics reporting. Then volunteers can run various numpy workloads past it. Anne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] caching large allocations on gnu/linux
On 13.03.2017 16:21, Anne Archibald wrote: > > > On Mon, Mar 13, 2017 at 12:21 PM Julian Taylor > mailto:jtaylor.deb...@googlemail.com>> > wrote: > > Should it be agreed that caching is worthwhile I would propose a very > simple implementation. We only really need to cache a small handful of > array data pointers for the fast allocate deallocate cycle that appear > in common numpy usage. > For example a small list of maybe 4 pointers storing the 4 largest > recent deallocations. New allocations just pick the first memory block > of sufficient size. > The cache would only be active on systems that support MADV_FREE (which > is linux 4.5 and probably BSD too). > > So what do you think of this idea? > > > This is an interesting thought, and potentially a nontrivial speedup > with zero user effort. But coming up with an appropriate caching policy > is going to be tricky. The thing is, for each array, numpy grabs a block > "the right size", and that size can easily vary by orders of magnitude, > even within the temporaries of a single expression as a result of > broadcasting. So simply giving each new array the smallest cached block > that will fit could easily result in small arrays in giant allocated > blocks, wasting non-reclaimable memory. So really you want to recycle > blocks of the same size, or nearly, which argues for a fairly large > cache, with smart indexing of some kind. > The nice thing about MADV_FREE is that we don't need any clever cache. The same process that marked the pages free can reclaim them in another allocation, at least that is what my testing indicates it allows. So a small allocation getting a huge memory block does not waste memory as the top unused part will get reclaimed when needed, either by numpy itself doing another allocation or a different program on the system. An issue that does arise though is that this memory is not available for the page cache used for caching on disk data. A too large cache might then be detrimental for IO heavy workloads that rely on the page cache. So we might want to cap it to some max size, provide an explicit on/off switch and/or have numpy IO functions clear the cache. signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] caching large allocations on gnu/linux
2017-03-13 18:11 GMT+01:00 Julian Taylor : > On 13.03.2017 16:21, Anne Archibald wrote: > > > > > > On Mon, Mar 13, 2017 at 12:21 PM Julian Taylor > > mailto:jtaylor.deb...@googlemail.com>> > > wrote: > > > > Should it be agreed that caching is worthwhile I would propose a very > > simple implementation. We only really need to cache a small handful > of > > array data pointers for the fast allocate deallocate cycle that > appear > > in common numpy usage. > > For example a small list of maybe 4 pointers storing the 4 largest > > recent deallocations. New allocations just pick the first memory > block > > of sufficient size. > > The cache would only be active on systems that support MADV_FREE > (which > > is linux 4.5 and probably BSD too). > > > > So what do you think of this idea? > > > > > > This is an interesting thought, and potentially a nontrivial speedup > > with zero user effort. But coming up with an appropriate caching policy > > is going to be tricky. The thing is, for each array, numpy grabs a block > > "the right size", and that size can easily vary by orders of magnitude, > > even within the temporaries of a single expression as a result of > > broadcasting. So simply giving each new array the smallest cached block > > that will fit could easily result in small arrays in giant allocated > > blocks, wasting non-reclaimable memory. So really you want to recycle > > blocks of the same size, or nearly, which argues for a fairly large > > cache, with smart indexing of some kind. > > > > The nice thing about MADV_FREE is that we don't need any clever cache. > The same process that marked the pages free can reclaim them in another > allocation, at least that is what my testing indicates it allows. > So a small allocation getting a huge memory block does not waste memory > as the top unused part will get reclaimed when needed, either by numpy > itself doing another allocation or a different program on the system. > Well, what you say makes a lot of sense to me, so if you have tested that then I'd say that this is worth a PR and see how it works on different workloads. > > An issue that does arise though is that this memory is not available for > the page cache used for caching on disk data. A too large cache might > then be detrimental for IO heavy workloads that rely on the page cache. > Yeah. Also, memory mapped arrays use the page cache intensively, so we should test this use case and see how the caching affects memory map performance. > So we might want to cap it to some max size, provide an explicit on/off > switch and/or have numpy IO functions clear the cache. > Definitely dynamically allowing the disabling this feature would be desirable. That would provide an easy path for testing how it affects performance. Would that be feasible? Francesc ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] docs.scipy.org down
Is https://docs.scipy.org/ being down known issue? Ryan -- Ryan May ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] docs.scipy.org down
On Tue, Mar 14, 2017 at 8:16 AM, Ryan May wrote: > Is https://docs.scipy.org/ being down known issue? > It is. Is being worked on; tracking issue is https://github.com/numpy/numpy/issues/8779 Thanks for reporting, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion