[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread STINNER Victor

STINNER Victor  added the comment:

I merged my PR 4199 (Document PyObject_Malloc()) and PR 4200 (Cleanup pymalloc) 
to prepare PR 4089.

PR 4089 should now be completed and well tested.

The real question is now if we need PyMem_AlignedAlloc()?

Stefan Krah and Nathaniel Smith are interested by aligned memory allocations, 
but both wrote that they don't plan to use PyMem_AlignedAlloc() if I understood 
correctly. What's the point of adding PyMem_AlignedAlloc() in that case?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread Stefan Krah

Stefan Krah  added the comment:

> For large allocations, you'll probably be better off implementing your own 
> aligned allocator on top of calloc than implementing your own calloc on top 
> of an aligned allocator. (It's O(1) overhead versus O(n).) And once you're 
> doing that you might want to use the same code for regular allocations too, 
> so that you don't need to keep track of whether each memory block used 
> aligned_calloc or aligned_malloc and can treat them the same... Depends on 
> your exact circumstances.

Yes, but if the whole array is initialized with actual values, then
the memset() overhead is not very large (something like 16% here).

If uninitialized (or very sparse), the overhead is of course gigantic.

What is more, in some crude tests the posix_memalign() performance isn't
that great compared to malloc()/calloc().

C11 aligned_alloc() is also quite a bit faster than posix_memalign() here.

I think you're right that a hand-rolled solution on top of calloc() is
best for my use case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 9ed83c40855b57c10988f76770a4eb825e034cd8 by Victor Stinner in 
branch 'master':
bpo-18835: Cleanup pymalloc (#4200)
https://github.com/python/cpython/commit/9ed83c40855b57c10988f76770a4eb825e034cd8


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

> But since no fast (kernel-zeroed) aligned_calloc() exists, I must use 
> memset() anyway. 

For large allocations, you'll probably be better off implementing your own 
aligned allocator on top of calloc than implementing your own calloc on top of 
an aligned allocator. (It's O(1) overhead versus O(n).) And once you're doing 
that you might want to use the same code for regular allocations too, so that 
you don't need to keep track of whether each memory block used aligned_calloc 
or aligned_malloc and can treat them the same... Depends on your exact 
circumstances.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +4168

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread Stefan Krah

Stefan Krah  added the comment:

On Tue, Oct 31, 2017 at 02:55:04PM +, Nathaniel Smith wrote:
> 3) also it's not clear what the best approach will look like, given that we 
> care a lot about using calloc when possible, and have reason to prefer using 
> regular freeing functions whenever possible.

I actually have the same problems. But since no fast (kernel-zeroed)
aligned_calloc() exists, I must use memset() anyway.

So an emulated aligned_calloc() should probably not go into CPython
since it doesn't provide any performance advantages.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Le 31/10/2017 à 15:55, Nathaniel Smith a écrit :
> 
> 1) numpy hasn't actually come to a decision about whether to use aligned 
> allocation at all, or under what circumstances.

This isn't the Numpy bug tracker, but I can't help but mention that if
Numpy grew a facility for users to override the memory allocators it
invokes to allocate array data, Numpy may not have to come to a decision
about this at all... ;-) And it would also help specialized
accelerators, which may want to direct Numpy arrays to e.g. memory
that's cheaply shared with the GPU.

(see https://github.com/numpy/numpy/pull/5470)

> I wasn't making a criticism of your API; "it's not you, it's us" :-). But 
> this is a complicated and subtle area that's not really part of CPython's 
> core competency, and coming at a time when people are fretting about how to 
> shrink the C APIs surface area. E.g. I can think of more interesting ways for 
> the PyPy folks to spend their time than implementing an aligned_alloc 
> wrapper...

The same argument can be made for any part of the stdlib or core
language that PyPy has to reproduce.  Besides, I don't think
implementing an aligned_alloc wrapper is very difficult.  The hard part
is getting an agreement over the exposed APIs, and that's CPython's job,
not PyPy ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

> Can you elaborate why numpy wouldn't use this new API? I designed it with 
> numpy in mind :-)

The reasons I had in mind are:

1) numpy hasn't actually come to a decision about whether to use aligned 
allocation at all, or under what circumstances.

2) if we do use it, we'll probably need our own implementation anyway to 
support old pythons.

3) also it's not clear what the best approach will look like, given that we 
care a lot about using calloc when possible, and have reason to prefer using 
regular freeing functions whenever possible.

I wasn't making a criticism of your API; "it's not you, it's us" :-). But this 
is a complicated and subtle area that's not really part of CPython's core 
competency, and coming at a time when people are fretting about how to shrink 
the C APIs surface area. E.g. I can think of more interesting ways for the PyPy 
folks to spend their time than implementing an aligned_alloc wrapper...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread STINNER Victor

STINNER Victor  added the comment:

Stefan Krah: "we care about the C11 restriction? (...) "size - number of bytes 
to allocate. An integral multiple of alignment" (...) posix_memalign and 
_aligned_malloc don't care about the multiple."

I prefer to ignore this restriction at this point.

I wouldn't be surprised if posix_memalign() and _aligned_malloc() already align 
the size for us internally.

We can add the restriction later, if needed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread STINNER Victor

STINNER Victor  added the comment:

Nathaniel Smith: "Given the complexities here, and that the Track/Untrack 
functions are public now, I do wonder if the actual aligned allocation routines 
should just be an internal API (i.e., not exposed in Python.h)."

I don't see why we would hide PyMem_AlignedAlloc() but requires to implement 
aligned_alloc in PyMem_SetAllocators().

The plan is also to slowly use PyMem_AlignedAlloc() internally for performance.

Can you elaborate the "complexities"? Do you mean that the proposed 
PyMem_AlignedAlloc() API is more complex than calling directly posix_memalign()?

PyMem_AlignedAlloc() is designed for performance. For best performances, CPUs 
require memory to be aligned on convenient values like powers of 2 ;-) I also 
understand that alignment must be a multiple of sizeof(void*) because CPU work 
on "CPU words". On a 64-bit CPU, a word is 8 bytes. If the memory is aligned on 
4 bytes, it may have to fetch two words, you loose the advantage of memory 
alignment.

I understand that PyMem_AlignedAlloc() requirements come from the CPU 
arhcitecture, it's not an arbitrary limitation just for the fun ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-31 Thread STINNER Victor

STINNER Victor  added the comment:

Nathaniel: "(...) and numpy won't necessarily use this API anyway."

Can you elaborate why numpy wouldn't use this new API? I designed it with numpy 
in mind :-)

Using PyMem_AlignedAlloc() instead of using directly 
posix_memalign()/_aligned_alloc() provides the debug features for free:

* tracemalloc is able to trace memory allocations
* detect buffer underflow
* detect buffer overflow
* detect API misuse like PyMem_Free(PyMem_AlignedAlloc()) -- it doesn't detect 
free(PyMem_AlignedAlloc()) which is plain wrong on Windows (but this one should 
crash immediately ;-))

Other advantages:

* PyMem_AlignedAlloc(alignment, 0) is well defined: it never returns NULL
* PyMem_AlignedAlloc(alignment, size) checks on alignment value are the same on 
all operating systems

Moreover, Python takes care of the portability for you :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-28 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

Given the complexities here, and that the Track/Untrack functions are public 
now, I do wonder if the actual aligned allocation routines should just be an 
internal API (i.e., not exposed in Python.h).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-28 Thread Stefan Krah

Stefan Krah  added the comment:

> The ways we've discussed using aligned allocation in numpy wouldn't follow 
> this requirement without special checking. Which isn't necessarily a big 
> deal, and numpy won't necessarily use this API anyway. But I would suggest 
> being very clear about exactly what you guarantee and what you don't :-).

In the GitHub issue we sort of decided to make the more relaxed Posix
semantics official:

'alignment' must be a power of 2 and a multiple of 'sizeof(void *)'.

'size' can be really anything, so it should work for numpy.

It's a pity that Posix does not round up align={1,2,4} to 'sizeof(void *)'
automatically (why not?), so the applications will have to do that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-27 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

>  On the other hand, sane requests will have the exact multiple most of the 
> time anyway.

The ways we've discussed using aligned allocation in numpy wouldn't follow this 
requirement without special checking. Which isn't necessarily a big deal, and 
numpy won't necessarily use this API anyway. But I would suggest being very 
clear about exactly what you guarantee and what you don't :-).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-27 Thread STINNER Victor

STINNER Victor  added the comment:

PR 4089 becomes much more larger than what I expected, so I propose to defer 
enhancements to following PR, especially the idea of "emulating" 
PyMem_AlignedAlloc() on top of PyMem_Malloc() if the user calls 
PyMem_SetAllocators() without implemented aligned_alloc.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-27 Thread Stefan Krah

Stefan Krah  added the comment:

Should we care about the C11 restriction?

http://en.cppreference.com/w/c/memory/aligned_alloc

"size - number of bytes to allocate. An integral multiple of alignment"



posix_memalign and _aligned_malloc don't care about the multiple.

On the other hand, sane requests will have the exact multiple most
of the time anyway.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-25 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

> I'm not sure that it's a good idea to provide a "aligned malloc" fallback if 
> such fallback would be inefficient. For example, we would have to 
> overallocate the memory block not only for the requested alignement, but also 
> allocates extra sizeof(size_t) bytes, *in each* aligned memmory block, to 
> store the size of the alignment itself, to recover the original pointer... to 
> finally be able to call free().

You can re-use the same bytes for padding and to store the offset. The main 
tricky thing is that for an alignment of N bytes you need to overallocate N 
bytes instead of (N-1). (In the worst case, malloc returns you a pointer that's 
already N-byte aligned, and then you have to advance it by a full N bytes so 
that you have some space before the pointer to store the offset.)

Also you want to do a little N = max(N, sizeof(whatever int type you use)) at 
the beginning to make sure it's always big enough, but this is trivial (and 
really even a uint16_t is plenty big to store all reasonable alignments). And 
make sure that N is a power-of-two, which guarantees that whatever value malloc 
returns will be shifted by at least malloc's regular alignment, which is 
guaranteed to be large enough to store a standard int type (on reasonable 
systems).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-25 Thread Xavier de Gaye

Xavier de Gaye  added the comment:

Android has both memalign() [1] and posix_memalign() [2] and does not have 
aligned_alloc(), posix_memalign() is a wrapper around memalign() [3].

[1] 
https://android.googlesource.com/platform/bionic/+/master/libc/include/malloc.h#38
[2] 
https://android.googlesource.com/platform/bionic/+/master/libc/include/stdlib.h#80
[3] https://android.googlesource.com/platform/bionic/+/85aad90%5E%21/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-25 Thread Stefan Krah

Stefan Krah  added the comment:

> In Python 3.7, should we also add the "aligned alloc" requirement?

Linux, BSD, OSX, MSVC should be covered. According to Stackoverflow
MinGW has an internal function.

Android, I don't know. Xavier?

--
nosy: +xdegaye

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-25 Thread STINNER Victor

STINNER Victor  added the comment:

Currently, the main question on my PR 4089 was raised by Antoine Pitrou:
"Do people have to provide aligned_alloc and aligned_free? Or can they leave 
those pointers NULL and get a default implementation?"

My reply: "Currently, you must provide all allocator functions, included 
aligned_alloc and aligned_free. Technically, we can implement a fallback, but 
I'm not sure that I want to do that :-)"

I'm not sure about that. I can imagine platforms which provide a special 
malloc/free and that's all: no calloc, posix_memalign or _aligned_malloc(). But 
is Python suppose to fills the holes? For example, implement calloc() as 
malloc()+memset()? Or is the user of the PyMem_SetAllocator() API responsible 
to reimplement them?

In Python 3.5, we added the requirement of a working calloc().

In Python 3.7, should we also add the "aligned alloc" requirement?

In case of doubt, I prefer not to guess, and leave the decision to the caller 
of the API: require all functions to be implemented.

I'm not sure that it's a good idea to provide a "aligned malloc" fallback if 
such fallback would be inefficient. For example, we would have to overallocate 
the memory block not only for the requested alignement, but also allocates 
extra sizeof(size_t) bytes, *in each* aligned memmory block, to store the size 
of the alignment itself, to recover the original pointer... to finally be able 
to call free().

An aligned memory block would look like: [A SSS DDD...DDD] where A 
are bytes lost for alignment, SSS bytes storing the alignment size (size of 
"A" in this example), and "DDD...DDD" would be the actual data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-24 Thread Stefan Krah

Stefan Krah  added the comment:

[me]
> This weakens my use case somewhat [...]

I looked at Victor's patch, and thanks to the alignment <= ALIGNMENT
optimization it seems that always using the aligned_alloc() and
aligned_free() versions for a specific pointer is fast. Nice!

So I retract the weakening of my use case (still shame on Microsoft
for not implementing C11). :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Benjamin Peterson

Benjamin Peterson  added the comment:

Having the ability to allocated aligned memory could help avoid some undefined 
behavior. See #27987 (though, we only need 16-byte alignment there)

--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Stefan Krah

Stefan Krah  added the comment:

On Mon, Oct 23, 2017 at 09:16:08PM +, Antoine Pitrou wrote:
> > The Arrow memory format for example recommends 64 bit alignment.
> 
> I presume you mean 64 bytes?

Yes, I was typing too fast.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

By the way:

> The Arrow memory format for example recommends 64 bit alignment.

I presume you mean 64 bytes?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

There's also aligned calloc, which no native APIs support but is still quite 
useful.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Stefan Krah

Stefan Krah  added the comment:

On Mon, Oct 23, 2017 at 05:16:53PM +, STINNER Victor wrote:
> Memory allocated by PyMem_AlignedAlloc() must be freed with 
> PyMem_AlignedFree().
> 
> We cannot reuse PyMem_Free(). On Windows, PyMem_AlignedAlloc() is implemented 
> with _aligned_malloc() which requires to release the memory with 
> _aligned_free().

Ah, too bad. Of course Windows does something different again. This weakens
my use case somewhat, but I guess it would still be nice to have the functions
(if you think it's maintainable).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread STINNER Victor

STINNER Victor  added the comment:

I added _PyTraceMalloc_Track() and _PyTraceMalloc_Untrack() private functions 
to the C API in Python 3.6. These functions were made public in Python 3.7: 
renamed to PyTraceMalloc_Track() and PyTraceMalloc_Untrack(). I made this 
change to allow numpy to trace memory allocations, to debug memory leaks.

numpy cannot use Python memory allocators because numpy requires aligned memory 
blocks which are required to use CPU SIMD instructions.


Stefan Krah:
> I think many people would welcome this in scientific computing: The Arrow 
> memory format for example recommends 64 bit alignment.

Ah, that's an interesting use case.

I created attached PR 4089 to implement PyMem_AlignedAlloc():

   void* PyMem_AlignedAlloc(size_t alignment, size_t size);
   void PyMem_AlignedFree(void *ptr);

Memory allocated by PyMem_AlignedAlloc() must be freed with PyMem_AlignedFree().

We cannot reuse PyMem_Free(). On Windows, PyMem_AlignedAlloc() is implemented 
with _aligned_malloc() which requires to release the memory with 
_aligned_free().


Raymond Hettinger:
>> Adding yet another API to allocate memory has a cost
> Please don't FUD this one to death.

Statistics (size) on my PR:

 Doc/c-api/memory.rst  |  43 +-
 Doc/whatsnew/3.7.rst  |   4 +
 Include/internal/mem.h|   6 +-
 Include/objimpl.h |   2 +
 Include/pymem.h   |  16 +-
 .../2017-10-23-19-03-38.bpo-18835.8XEjtG.rst  |   9 +
 Modules/_testcapimodule.c |  83 ++-
 Modules/_tracemalloc.c| 131 +++-
 Objects/obmalloc.c| 616 +--
 9 files changed, 655 insertions(+), 255 deletions(-)

That's quite a big change "just to add PyMem_AlignedAlloc()" :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +4059
stage: needs patch -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Antoine Pitrou

Change by Antoine Pitrou :


--
versions: +Python 3.7 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Stefan Krah

Stefan Krah  added the comment:

Yes, I think it is partly convenience. I want to set ...

   ndt_mallocfunc = PyMem_Malloc;
   ndt_alignedallocfunc = PyMem_AlignedAlloc;
   ndt_callocfunc = PyMem_Calloc;
   ndt_reallocfunc = PyMem_Realloc;
   ndt_freefunc = PyMem_Free;

... so I can always just call ndt_free(), because there's only one memory
allocator.


But the other part is that datashape allows to specify alignment regardless
of the size of the type.  Example:

>>> from ndtypes import *
>>> from xnd import *
>>> t = ndt("{a: int64, b: uint64, align=16}")
>>> xnd(t, {'a': 111, 'b': 222})



The xnd object essentially wraps a typed data pointer. In the above case, the
'align' keyword has the same purpose as gcc's __attribute__((aligned(16))).


There are several other cases in datashape where alignment can specified
explicitly.


For the convenience case it would already help if PyMem_AlignedAlloc() did
*not* use the fast allocator, but just delegated to _aligned_malloc() (MSVC)
or aligned_alloc() (C11), ...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Do you need aligned allocation even on small objects?  The Python allocator 
doesn't handle allocations > 512 bytes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-10-23 Thread Stefan Krah

Stefan Krah  added the comment:

I need this too. I would like to set this

https://github.com/plures/ndtypes/commit/c260fdbae707da0dfefef499621a0a9f37a3e509#diff-2402fff6223084b74d97237c0d620b29R50

to something line PyMem_AlignedAlloc(), because the Python allocator is faster.


I think many people would welcome this in scientific computing: The Arrow 
memory format for example recommends 64 bit alignment.

--
nosy: +skrah
resolution: rejected -> 
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2017-01-23 Thread STINNER Victor

STINNER Victor added the comment:

Antoine Pitrou: "Benchmarks and Intel's recommendation show that aligned 
allocation is actually important for AVX performance, and NumPy depends on 
CPython providing the right allocation APIs (for integration with tracemalloc): 
https://github.com/numpy/numpy/issues/5312;

I don't think that NumPy was ever fully integrated with tracemalloc.

Since Python 3.6, NumPy doesn't have to use Python memory allocators to benefit 
of tracemalloc debugger: I added a new C API to be able to manually 
track/untrack memory blocks which are not directly allocated by Python, see the 
issue #26530. I implemented this feature for NumPy, but since I never got any 
feedback from NumPy, I left the API private.

Moreover, I also added second feature to tracemalloc: it's now possible to 
track memory allocation in different address spaces. The feature was also 
designed for NumPy which can allocate memory in the GPU address space. See the 
issue #26588.

With these new tracemalloc features, I don't think that NumPy can still be used 
to request this feature in CPython core.

--

Raymond: "A principal use case would be PyObject pointers where we want to keep 
all or most of the data fields in the same cache line (i.e. the fields for 
list, tuple, dict, and set objects).

Deques would benefit from having the deque blocks aligned to 64byte boundaries 
and never crossing page boundaries.  Set entries would benefit from 32byte 
alignment."

Victor (me!): "Do you have an idea of performance benefit of memory alignment?"

Since Raymond never provided any evidence that a new aligned memory allocator 
would give a significant speedup, and there issue is inactive for 2 years, I 
close it.

See also the change 6e16b0045cf1, it seems like Raymond doesn't plan to use 
this feature anymore.

--

If someone wants this feature, we need good reasons to implement it.

--
resolution:  -> rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2015-01-15 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Due to the realloc() problem, I'm thinking that the best course for Numpy would 
be to use its own allocator wrapper like Nathaniel outligned.
Also, Numpy wants calloc() for faster allocation of zeroed arrays...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2015-01-14 Thread William Scullin

Changes by William Scullin wscul...@gmail.com:


--
nosy: +wscullin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-08 Thread Trent Nelson

Changes by Trent Nelson tr...@snakebite.org:


--
nosy: +trent

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Benchmarks and Intel's recommendation show that aligned allocation is actually 
important for AVX performance, and NumPy depends on CPython providing the right 
allocation APIs (for integration with tracemalloc): 
https://github.com/numpy/numpy/issues/5312

So I think for 3.5 we should start providing the APIs. Whether we use them in 
Python core is another discussion.

Nathaniel, what APIs would you need exactly? See Victor's proposal in 
msg196834.

--
nosy: +njs
type:  - enhancement
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread STINNER Victor

STINNER Victor added the comment:

Windows provides:

void * _aligned_malloc(
size_t size, 
size_t alignment
);

http://msdn.microsoft.com/en-US/library/8z34s9c6%28v=vs.80%29.aspx

How should we handle platforms which don't provide a memory allocator with an 
alignment? The simplest option is to return NULL (MemoryError).

Allocating more memory and skip first bytes may work, but how do we retrieve 
the original address if the function releasing the memory block?

What about Solaris, Mac OS X, FreeBSD, etc.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 How should we handle platforms which don't provide a memory allocator
 with an alignment? The simplest option is to return NULL (MemoryError).

Are there such platforms? posix_memalign() is a POSIX standard, even OpenBSD 
has it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread STINNER Victor

STINNER Victor added the comment:

PyMem_GetAllocator() and PyMem_SetAllocator() have a domain parameter which can 
take 3 values: PYMEM_DOMAIN_RAW, PYMEM_DOMAIN_MEM and PYMEM_DOMAIN_OBJ.

I don't think that we need 3 flavors of allocators (PyMem_Raw, PyMem, PyObject).

Maybe the PYMEM_DOMAIN_RAW domain is enough: OS functions don't require the 
GIL. In this case, should we add a new pair of Get/Set functions with an 
associated structure? Or maybe PyMem_SetAllocator() may ignore the aligned 
members of the patched PyMemAllocatorEx structure for domains other than 
PYMEM_DOMAIN_RAW? And PyMem_GetAllocator() would fill members with NULL.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread Nathaniel Smith

Nathaniel Smith added the comment:

It's not terribly difficult to write a crude-but-effective aligned allocator on 
top of raw malloc:

def aligned_malloc(size, alignment):
assert alignment  255
raw_pointer = (uint8*) malloc(size + alignment)
shift = alignment - (raw_pointer % alignment)
assert 0  shift = alignment
aligned_pointer = raw_pointer + shift
*(aligned_pointer - 1) = shift
return aligned_pointer

def aligned_free(uint8* pointer):
shift = *(pointer - 1)
free(pointer - shift)

But, this fallback and the official Win32 API both disallow the use of plain 
free() (like Victor points out in msg196834), so we can't just add an 
aligned_malloc slot to the PyMemAllocator struct. This kind of aligned 
allocation is effectively its own memory domain.

If native aligned allocation support were added to PyMalloc then it could 
potentially do better (e.g. by noticing that it already has a block on its 
freelist with the requested alignment and just returning that instead of 
overallocating). This might be the ideal solution for Raymond's use case, but I 
have no idea how much work it would be to mess around with PyMalloc innards.

Numpy doesn't currently use aligned allocation for anything, but we'd like to 
keep our options open. If we do end up using it in the future then there's a 
reasonable chance we might want to use it *without* the GIL held (e.g. for 
allocating temporary buffers inside C loops). OTOH we are also happy to 
implement the aligned allocation ourselves (either on top of the system APIs or 
directly) -- we just don't want to lose tracemalloc support when we do.

For numpy's purposes, I think the best approach would be to add a tracemalloc 
escape valve, with an interface like:

PyMem_RecordAlloc(const char* domain, void* tag, size_t quantity, 
PyMem_RecordRealloc(const char* domain, void* old_tag, void* new_tag, size_t 
new_quantity)
PyMem_RecordFree(const char* domain, void* tag)

where the idea is that if after someone allocates memory (or potentially other 
discrete resources) directly without going through PyMem_*, they could then 
call these functions to tell tracemalloc what they just did.

This would be useful in a number of cases: in addition to tracking aligned 
allocations, it would make it possible to re-use the tracemalloc infrastructure 
to track GPU buffers allocated by CUDA/GPGPU-type code, mmap usage, hugetlbfs 
usage, etc. Potentially even open file descriptors if one wants to go there 
(seems pretty useful, actually).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Le 05/12/2014 23:15, STINNER Victor a écrit :
 
 
 I don't think that we need 3 flavors of allocators (PyMem_Raw,
 PyMem,
PyObject).
 
 Maybe the PYMEM_DOMAIN_RAW domain is enough: OS functions don't
require the GIL. In this case, should we add a new pair of Get/Set
functions with an associated structure?

How about a new domain instead? PYMEM_DOMAIN_RAW_ALIGNED?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread Nathaniel Smith

Nathaniel Smith added the comment:

Re: msg232219: If you go down the route of adding both aligned_malloc and 
aligned_free to the Allocator structure, I think you might as well support it 
for all domains? For the PyMem and PyObject domains you can just literally set 
the default functions to be PyMem_RawAlignedMalloc and PyMem_RawAlignedFree, 
and that leaves the door open to fancier implementations in the future (e.g. if 
PyMalloc starts implementing aligned allocation directly).

Re: msg23: Currently all the domains share the same vtable struct, though, 
whereas aligned allocator functions have different signatures. So you can't 
literally just add an entry to the existing domain enum and be done.

It also occurs to me that if numpy ever gets serious about using aligned memory 
then we might also need aligned_realloc (numpy allows arrays to be resized, 
sigh), which is possible to do but I *cannot* recommend python attempt to 
provide as a standard interface. (It's not available at all in POSIX.) I guess 
this is another argument that it might be best to just give numpy an escape 
valve and worry about CPython's internal needs separately.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-12-05 Thread STINNER Victor

STINNER Victor added the comment:

You cannot just add a new domain because the function prototypes are
different (need an extra alignement parameter). You have to add new members
to the structure or add a new structure.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2014-04-26 Thread STINNER Victor

STINNER Victor added the comment:

It looks like a memory allocator with a given alignment would help numpy, for 
SIMD instructions:
https://mail.python.org/pipermail/python-dev/2014-April/134092.html
(but Numpy does not currently use aligned allocation, and it's not clear how 
important it is)

See also this old discussion on python-dev:
https://mail.python.org/pipermail/python-dev/2010-September/103911.html

Related to this website:
http://mallocv2.wordpress.com/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2013-09-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 We don't have to align EVERY data structure.  But I do have immediate
 beneficial use cases for set tables and for data blocks in deque
 objects.

Can you explain what the use cases are, and post some benchmarking code?

Also, what would be the strategy? Would you align every set/deque, or only
the bigger ones?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2013-09-03 Thread STINNER Victor

STINNER Victor added the comment:

Linux provides the following functions:

int posix_memalign(void **memptr, size_t alignment, size_t size);
void *valloc(size_t size);  # obsolete
void *memalign(size_t boundary, size_t size);  # obsolete

Windows provides the following functions:

void* _aligned_malloc(size_t size,  size_t alignment);
void _aligned_free(void *memblock);

_aligned_malloc() has a warning: Using free is illegal.

Do all platform provide at least a function to allocate aligned memory? 
Windows, Mac OS X, FreeBSD, old operating systems, etc.? If no, how should we 
handle these operating systems? Refuse to start Python? Disable some 
optimizations? How should we check in the source code (ex: setobject.c) than 
aligned allocations are not supported? An #ifdef?

***

Because of the Windows API, wee need at least two functions:

void* PyMem_MallocAligned(size_t size, size_t alignment);
void PyMem_FreeAligned(void *ptr);

The GIL must be held when callling these functions.


Windows provides also a realloc function:

void* _aligned_realloc(void *memblock, size_t size,  size_t alignment);

If the OS does not provide a realloc function, we can reimplement it (malloc, 
memcpy, free).

void* PyMem_ReallocAligned(void *ptr, size_t size, size_t alignment);

***

For the PEP 445: the new API is different than the PyMemAllocator structure 
because malloc and realloc functions have an extra alignment parameter. We 
can drop the parameter if the allocator has always the size alignment, but each 
object may require a different aligment?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2013-09-01 Thread Charles-François Natali

Charles-François Natali added the comment:

 Please don't FUD this one to death.  Aligned memory access is
 sometimes important and we currently have no straight-forward
 way to achieve it.

I guess that a simple way to cut the discussion short would be to have a first 
implementation, and run some benchmarks to measure the benefits.

I can certainly see the benefit of cacheline-aligned data structures in 
multithreaded code (to avoid false sharing/cacheline bouncing): I'm really 
curious to see how much this would benefit in a single-threaded workload.

--
nosy: +neologix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2013-08-31 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
title: Add aligned memroy variants to the suite of PyMem functions/macros - 
Add aligned memory variants to the suite of PyMem functions/macros

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2013-08-31 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
Removed message: http://bugs.python.org/msg196692

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2013-08-31 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 Adding yet another API to allocate memory has a cost

Please don't FUD this one to death.  Aligned memory access is sometimes 
important and we currently have no straight-forward way to achieve it.  If 
you're truly worried about adding single new function to the public C API, we 
can create  just a single internal function:  void * 
_PyMem_RawMallocAligned(size_t size, size_t alignment).

 aligning every data structure on a cacheline boundary 
 doesn't sound like a very good idea

We don't have to align EVERY data structure.  But I do have immediate 
beneficial use cases for set tables and for data blocks in deque objects.  I 
need this function and would appreciate your help in fitting it in nicely with 
the current memory management functions and macros.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18835] Add aligned memory variants to the suite of PyMem functions/macros

2013-08-31 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Attaching a patch for what I would like to do with the alignment functions and 
macros.

--
keywords: +patch
Added file: http://bugs.python.org/file31543/align.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com