Re: [Numpy-discussion] NEP 49: Data allocation strategies

2021-04-21 Thread Ralf Gommers
On Tue, Apr 20, 2021 at 2:18 PM Matti Picus  wrote:

> I have submitted NEP 49 to enable user-defined allocation strategies for
> the ndarray.data homogeneous memory area. The implementation is in PR
> 17582 https://github.com/numpy/numpy/pull/17582 Here is the text of the
> NEP:
>

Thanks Matti!


>
> Abstract
> 
>
> The ``numpy.ndarray`` requires additional memory allocations
> to hold ``numpy.ndarray.strides``, ``numpy.ndarray.shape`` and
> ``numpy.ndarray.data`` attributes. These attributes are specially allocated
> after creating the python object in ``__new__`` method. The ``strides`` and
> ``shape`` are stored in a piece of memory allocated internally.
>
> This NEP proposes a mechanism to override the memory management strategy
> used
> for ``ndarray->data`` with user-provided alternatives. This allocation
> holds
> the arrays data and is can be very large. As accessing this data often
> becomes
> a performance bottleneck, custom allocation strategies to guarantee data
> alignment or pinning allocations to specialized memory hardware can enable
> hardware-specific optimizations.
>
> Motivation and Scope
> 
>
> Users may wish to override the internal data memory routines with ones
> of their
> own. Two such use-cases are to ensure data alignment and to pin certain
> allocations to certain NUMA cores.
>

It would be great to expand a bit on these two sentences, and add some
links. There's a lot of history here in NumPy development to refer to as
well:

https://numpy-discussion.scipy.narkive.com/MvmMkJcK/numpy-arrays-data-allocation-and-simd-alignement
http://numpy-discussion.10968.n7.nabble.com/Aligned-configurable-memory-allocation-td39712.html
http://numpy-discussion.10968.n7.nabble.com/Numpy-s-policy-for-releasing-memory-td1533.html
https://github.com/numpy/numpy/issues/5312
https://github.com/numpy/numpy/issues/14177

There must also be a good amount of ideas/discussion elsewhere.

https://bugs.python.org/issue18835 discussed an aligned allocator for
Python itself, with fairly detailed discussion about whether/how NumPy
could benefit. With (I think) the conclusion it shouldn't be in Python, but
NumPy/Arrow/others are better off doing their own thing.

I'm wondering if improved memory profiling is a use case as well? Fil (
https://github.com/pythonspeed/filprofiler) for example seems to use such a
strategy:
https://github.com/pythonspeed/filprofiler/blob/master/design/allocator-overrides.md

Does it interact with our tracemalloc support (
https://numpy.org/doc/stable/release/1.13.0-notes.html#support-for-tracemalloc-in-python-3-6
)?


> User who wish to change the NumPy data memory management routines will use
>

This is design, not motivation or scope. Try to not refer to specific
function names in this section. I suggest moving this content to the
"Detailed design" section (or better, a "high level design" at the start of
that section).

Cheers,
Ralf



:c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler`
> structure to hold pointers to functions used to manage the data memory. The
> calls are wrapped by internal routines to call
> :c:func:`PyTraceMalloc_Track`,
> :c:func:`PyTraceMalloc_Untrack`, and will use the
> :c:func:`PyDataMem_EventHookFunc` mechanism  already present in NumPy for
> auditing purposes.
>
> Since a call to ``PyDataMem_SetHandler`` will change the default
> functions, but
> that function may be called during the lifetime of an ``ndarray``
> object, each
> ``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the
> time of its instantiation, and these will be used to reallocate or free the
> data memory of the instance. Internally NumPy may use ``memcpy` or
> ``memset``
> on the data ``ptr``.
>
> Usage and Impact
> 
>
> The new functions can only be accessed via the NumPy C-API. An example is
> included later in the NEP. The added ``struct`` will increase the size
> of the
> ``ndarray`` object. It is one of the major drawbacks of this approach.
> We can
> be reasonably sure that the change in size will have a minimal impact on
> end-user code because NumPy version 1.20 already changed the object size.
>
> Backward compatibility
> --
>
> The design will not break backward compatibility. Projects that were
> assigning
> to the ``ndarray->data`` pointer were already breaking the current memory
> management strategy (backed by ``npy_alloc_cache``) and should restore
> ``ndarray->data`` before calling ``Py_DECREF``. As mentioned above, the
> change
> in size should not impact end-users.
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Add smallest_normal and smallest_subnormal attributes to finfo

2021-04-21 Thread Stephannie Jiménez Gacha
Good afternoon,

Given the discussions happened in the Data API consortium when looking into
the attributes of `finfo` used in the wild, we found that `tiny` is used
regularly but in a good amount of cases not for its intended purpose but
rather as "just give me a small number". Following this we are proposing
the addition of `smallest_normal` and `smallest_subnormal` attributes.
Personally, I think that the `tiny` name is a little bit odd and
misleading, so it will be great to leave that as an alias but have a clear
name in this class.

Right now the PR: https://github.com/numpy/numpy/pull/18536 has all the
changes and all the values added were checked against IEEE-754 standard.
One of the main concerns is the support of subnormal numbers in certain
architectures, where the values can't be calculated accurately. Given the
state of the discussion, we don't know if the best alternative is to not
add the `smallest_subnormal` attribute and just add the `smallest_number`
attribute as an alias to `tiny`.

We open this to discussion to see what way we can go in order to get this
PR merged.

*Stephannie Jimenez Gacha*Software developer

*Quansight* | Your Data Experts

w: www.quansight.com  e: sga...@quansight.com


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 49: Data allocation strategies

2021-04-21 Thread Matti Picus
See my comments interspersed in Ralf's reply. Thanks for the additional 
context.


Matti


On 21/4/21 3:10 am, Ralf Gommers wrote:



...

Motivation and Scope


Users may wish to override the internal data memory routines with
ones
of their
own. Two such use-cases are to ensure data alignment and to pin
certain
allocations to certain NUMA cores.


It would be great to expand a bit on these two sentences, and add some 
links. There's a lot of history here in NumPy development to refer to 
as well:


https://numpy-discussion.scipy.narkive.com/MvmMkJcK/numpy-arrays-data-allocation-and-simd-alignement 

http://numpy-discussion.10968.n7.nabble.com/Aligned-configurable-memory-allocation-td39712.html 

http://numpy-discussion.10968.n7.nabble.com/Numpy-s-policy-for-releasing-memory-td1533.html 

https://github.com/numpy/numpy/issues/5312 

https://github.com/numpy/numpy/issues/14177 



There must also be a good amount of ideas/discussion elsewhere.



I added more context to this section, trying to focus on the large data 
allocations in NumPy.





https://bugs.python.org/issue18835 
 discussed an aligned allocator 
for Python itself, with fairly detailed discussion about whether/how 
NumPy could benefit. With (I think) the conclusion it shouldn't be in 
Python, but NumPy/Arrow/others are better off doing their own thing.


I'm wondering if improved memory profiling is a use case as well? Fil 
(https://github.com/pythonspeed/filprofiler 
) for example seems to use 
such a strategy: 
https://github.com/pythonspeed/filprofiler/blob/master/design/allocator-overrides.md 




Thanks. I added a sentence about this as well.


Does it interact with our tracemalloc support 
(https://numpy.org/doc/stable/release/1.13.0-notes.html#support-for-tracemalloc-in-python-3-6 
)?




I added a sentence about this. The new C-API wrapper functions preserve 
the current status vis-a-vis tracemalloc support. I am not sure that 
support is complete. The NEP should not change the situation for better 
or worse.





User who wish to change the NumPy data memory management routines
will use


This is design, not motivation or scope. Try to not refer to specific 
function names in this section. I suggest moving this content to the 
"Detailed design" section (or better, a "high level design" at the 
start of that section).



Done.



Cheers,
Ralf



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion