[Numpy-discussion] Re: writing a known-size 1D ndarray serially as it's calced

2022-08-23 Thread Michael Siebert
Hi all,

I‘ve made the Pip/Conda module npy-append-array for exactly this purpose, see

https://github.com/xor2k/npy-append-array

It works with one dimensional arrays, too, of course. The key challange is to 
properly initialize and update the header accordingly as the array grows which 
my module takes care of. I‘d like to integrate this functionality directly into 
Numpy, see PR

https://github.com/numpy/numpy/pull/20321/

but I have been busy and did have not received any feedback recently. A more 
direct integration into Numpy would allow to skip or ease the header update 
part, e.g. by introducing a new file format version. This could turn .npy into 
a sort of binary CSV equivalent where the size of the array is determined by 
the file size.

Best, Michael

> On 24. Aug 2022, at 03:04, Robert Kern  wrote:
> 
> On Tue, Aug 23, 2022 at 8:47 PM  wrote:
>> I want to calc multiple ndarrays at once and lack memory, so want to write 
>> in chunks (here sized to GPU batch capacity). It seems there should be an 
>> interface to write the header, then write a number of elements cyclically, 
>> then add any closing rubric and close the file. 
>> 
>> Is it as simple as lib.format.write_array_header_2_0(fp, d) 
>> then writing multiple shape(N,) arrays of float by fp.write(item.tobytes())?
>  
> `item.tofile(fp)` is more efficient, but yes, that's the basic scheme. There 
> is no footer after the data.
> 
> The alternative is to use `np.lib.format.open_memmap(filename, mode='w+', 
> dtype=dtype, shape=shape)`, then assign slices sequentially to the returned 
> memory-mapped array. A memory-mapped array is usually going to be friendlier 
> to whatever memory limits you are running into than a nominally "in-memory" 
> array.
> 
> -- 
> Robert Kern
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: michael.sieber...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: writing a known-size 1D ndarray serially as it's calced

2022-08-23 Thread Robert Kern
On Tue, Aug 23, 2022 at 8:47 PM  wrote:

> I want to calc multiple ndarrays at once and lack memory, so want to write
> in chunks (here sized to GPU batch capacity). It seems there should be an
> interface to write the header, then write a number of elements cyclically,
> then add any closing rubric and close the file.
>
> Is it as simple as lib.format.write_array_header_2_0(fp, d)
> then writing multiple shape(N,) arrays of float by
> fp.write(item.tobytes())?
>

`item.tofile(fp)` is more efficient, but yes, that's the basic scheme.
There is no footer after the data.

The alternative is to use `np.lib.format.open_memmap(filename, mode='w+',
dtype=dtype, shape=shape)`, then assign slices sequentially to the returned
memory-mapped array. A memory-mapped array is usually going to be
friendlier to whatever memory limits you are running into than a nominally
"in-memory" array.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] writing a known-size 1D ndarray serially as it's calced

2022-08-23 Thread bross_phobrain
I want to calc multiple ndarrays at once and lack memory, so want to write in 
chunks (here sized to GPU batch capacity). It seems there should be an 
interface to write the header, then write a number of elements cyclically, then 
add any closing rubric and close the file. 

Is it as simple as lib.format.write_array_header_2_0(fp, d) 
then writing multiple shape(N,) arrays of float by fp.write(item.tobytes())?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] ENH: ndarray.__format__ implementation for numeric dtypes

2022-08-23 Thread Inessa Pawson
We are looking for more feedback on this PR:
https://github.com/numpy/numpy/pull/19550

If you would like to contribute to the discussion, please leave a comment
in the Conversation section of the PR.

-- 
Cheers,
Inessa

Inessa Pawson
Contributor Experience Lead | NumPy
https://numpy.org/
GitHub: inessapawson
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] next NumPy Newcomers' Hour

2022-08-23 Thread Inessa Pawson
Our next Newcomers' Hour will be held this Thursday, August 25th, at 2 pm
UTC. Stop by to ask questions or just to say hi.

To add to the meeting agenda the topics you’d like to discuss, follow the
link: https://hackmd.io/3f3otyyuTte3FU9y3QzsLg?both

Join the meeting via Zoom: https://us02web.zoom.us/j/87192457898

Cheers,
Inessa

Inessa Pawson
Contributor Experience Lead | NumPy
https://numpy.org/
GitHub: inessapawson
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-23 Thread Sebastian Berg
On Tue, 2022-08-23 at 14:00 +0200, Petr Viktorin wrote:
> On 23. 08. 22 11:46, Sebastian Berg wrote:
> > On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote:
> > > 
> > > On 22/8/22 18:59, Eric Snow wrote:
> > > > Hi all,
> > > > 
> > 
> > 
> > 
> > > > devs than just me.  Do you have any preference for or against
> > > > any
> > > > particular venue?
> > > > 
> > > > Thanks!
> > > > 
> > > > -eric
> > > > ___
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to 
> > > > numpy-discussion-le...@python.org
> > > >  https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: matti.pi...@gmail.com
> > > 
> > > Thanks for starting the conversation. I would personally prefer
> > > the
> > > discussion about NumPy be here, general discussions could be
> > > elsewhere.
> > > 
> > > 
> > > Please correct me if I am wrong: I understand that multiple
> > > interpreters
> > > would require us to (at least):
> > 
> > 
> > These days, I was somewhat hoping that the HPy effort might give us
> > subinterpreters without having two seperate efforts going on at the
> > same time.  Since much of the refactors are probably identical
> > between
> > the two and it seemed some significant effort might go into that.
> > 
> > But of course starting with subinterpreter support without HPy
> > probably
> > also helps the HPy effort.
> 
> Both should help each other.
> 
> > > - refactor all the static module global state in NumPy and make
> > > it
> > > re-entrant or immortal including converting stack-allocated
> > > PyTypeObjects to heap types.
> > 
> > What is the status of immortality?  None of these seem forbidding
> > on
> > first sight, so long that we can get the state everywhere.
> > Having immortal object seems convenient, but probably not
> > particularly
> > necessary.
> > 
> > Most of our state is currently in static variables in functions
> > (usually filled in dynamically at first call).  That is very
> > convenient
> > since it doesn't require a global list anywhere.
> > 
> > I suppose moving it to module-state may well require a global list
> > (or
> > is there a nice other pattern?).  But while tedious, it doesn't
> > seem
> > problematic.
> 
> A struct for the module state is the state of the art, yes.
> 
> > Switching to heap types should not be a big deal I suspect.
> > 
> > > 
> > > - find a mechanism to access the per-interpreter module state
> > > 
> > 
> > One thing that I am not clear about are e.g. creation functions. 
> > They
> > are public C-API so they have no way of getting a "self" or
> > type/module
> > passed in.  How will such a function get the module state?
> > 
> > Now, we could likely replace those functions in the long run (or
> > even
> > just remove many).  But it seems to me that we may need a
> > `PyType_GetModuleByDef()` that is passed _only_ the `module_def`?
> 
> Then you're looking at per-interpreter state, or thread-locals.
> That's 
> problematic, e.g. you now need to handle clean-up at interpreter 
> shutdown, and the that isn't well supported. (Or leak -- AFAIK that's
> what NumPy currently does when Python's single interpreter is
> finalized?)
> I do urge you to assume that there can be multiple isolated NumPy 
> modules created from a single def, even in a single interpreter. It's
> an 
> additional constraint, but since it's conceptually simple I do think
> it 
> makes up for itself in regularity/maintainability/reviewability.
> 
> And if the CPython API is lacking, it would be best to solve that in 
> CPython.
> 


The issue is that we have public C-API that will be lacking the
necessary information. Maybe pretty deep API (I am not certain).

Now that I think about it, even things like the type is unclear to me.
`_Type` would not be per interpreter (unless we figure out
immortality).  But it exists as public API just like `Py_None`, etc.?

Our public C-API is currently exported as a single static struct into
the library loading NumPy.  If types depend on the interpreter, it
would seem we need to redo the whole mechanism?
Further, many of the functions would need to be adapted.  We might be
able to hack that the API looks the same [1].  However, it cannot be
ABI compatible, so we would need a whole new API table/export mechnism
and some sort of shim to allow compiling against older NumPy versions
but using it with all versions (otherwise we need 2+ years of
patience).

Of course there might be a point in saying that most C-API use is
initially not subinterpreter ready, but it does seem like a pretty huge
limitation...

Cheers,

Sebastian


[1] I.e. smuggle in module state without the library importing the
NumPy C-API having to change its code.


> > > - carefully consider places in the code that we steal references
> > > either
> > > intentionally or because that is the CPython C-API we are using
> > > 
> > 
> > This is an issue for HPy that needs 

[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-23 Thread Petr Viktorin

On 23. 08. 22 11:46, Sebastian Berg wrote:

On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote:


On 22/8/22 18:59, Eric Snow wrote:

Hi all,






devs than just me.  Do you have any preference for or against any
particular venue?

Thanks!

-eric
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: matti.pi...@gmail.com


Thanks for starting the conversation. I would personally prefer the
discussion about NumPy be here, general discussions could be
elsewhere.


Please correct me if I am wrong: I understand that multiple
interpreters
would require us to (at least):



These days, I was somewhat hoping that the HPy effort might give us
subinterpreters without having two seperate efforts going on at the
same time.  Since much of the refactors are probably identical between
the two and it seemed some significant effort might go into that.

But of course starting with subinterpreter support without HPy probably
also helps the HPy effort.


Both should help each other.


- refactor all the static module global state in NumPy and make it
re-entrant or immortal including converting stack-allocated
PyTypeObjects to heap types.


What is the status of immortality?  None of these seem forbidding on
first sight, so long that we can get the state everywhere.
Having immortal object seems convenient, but probably not particularly
necessary.

Most of our state is currently in static variables in functions
(usually filled in dynamically at first call).  That is very convenient
since it doesn't require a global list anywhere.

I suppose moving it to module-state may well require a global list (or
is there a nice other pattern?).  But while tedious, it doesn't seem
problematic.


A struct for the module state is the state of the art, yes.


Switching to heap types should not be a big deal I suspect.



- find a mechanism to access the per-interpreter module state



One thing that I am not clear about are e.g. creation functions.  They
are public C-API so they have no way of getting a "self" or type/module
passed in.  How will such a function get the module state?

Now, we could likely replace those functions in the long run (or even
just remove many).  But it seems to me that we may need a
`PyType_GetModuleByDef()` that is passed _only_ the `module_def`?


Then you're looking at per-interpreter state, or thread-locals. That's 
problematic, e.g. you now need to handle clean-up at interpreter 
shutdown, and the that isn't well supported. (Or leak -- AFAIK that's 
what NumPy currently does when Python's single interpreter is finalized?)
I do urge you to assume that there can be multiple isolated NumPy 
modules created from a single def, even in a single interpreter. It's an 
additional constraint, but since it's conceptually simple I do think it 
makes up for itself in regularity/maintainability/reviewability.


And if the CPython API is lacking, it would be best to solve that in 
CPython.



- carefully consider places in the code that we steal references
either
intentionally or because that is the CPython C-API we are using



This is an issue for HPy that needs to be cleared up, although I am
wondering how important it is for subinterpreters as such?


Not important. Borrowed references work mainly to enable optimized 
collections that don't store full PyObjects -- currently that's HPy 
territory.
If you find the C API forcing you to steal references, I do want to 
eventually fix that in CPython to make switching to HPy easy (and 
eventually to enable the optimizations in CPython). A lot of “better” 
alternative APIs was actually added in recent versions, and I'd welcome 
requests for what to prioritize for Python 3.12+.



- measure the performance implications of the necessary changes

- plan forward/backward compatibility




One other thing I am not quite sure about right now is GIL grabbing.
`PyGILState_Ensure()` will continue to work reliably?
This used to be one of my main worries.  It is also something we can
fix-up (pass through additional information), but where a fallback
seems needed.


Per-interpreter GIL is an *additional* step. I believe it will need its 
own opt-in mechanism. But subinterpreter support is a prerequisite for it.

So yes, PyGILState_Ensure will still acquire a global lock for you.



Cheers,

Sebastian





This seems like a significant undertaking, and is why we have
rejected
casual calls for supporting multiple interpreters in the past [2],
[3],
[4]. Supporting multiple interpreters is currently not on the NumPy
roadmap [0]. Priorities can be changed, through dialog with the NumPy
community, and others can propose changes to NumPy via NEPs, PRs, and
issues, but we are unlikely to engage directly in the work if it is
not
an agreed upon goal. There are other initiatives around NumPy 

[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-23 Thread Sebastian Berg
On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote:
> 
> On 22/8/22 18:59, Eric Snow wrote:
> > Hi all,
> > 



> > devs than just me.  Do you have any preference for or against any
> > particular venue?
> > 
> > Thanks!
> > 
> > -eric
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: matti.pi...@gmail.com
> 
> Thanks for starting the conversation. I would personally prefer the 
> discussion about NumPy be here, general discussions could be
> elsewhere.
> 
> 
> Please correct me if I am wrong: I understand that multiple
> interpreters 
> would require us to (at least):


These days, I was somewhat hoping that the HPy effort might give us
subinterpreters without having two seperate efforts going on at the
same time.  Since much of the refactors are probably identical between
the two and it seemed some significant effort might go into that.

But of course starting with subinterpreter support without HPy probably
also helps the HPy effort.

> 
> - refactor all the static module global state in NumPy and make it 
> re-entrant or immortal including converting stack-allocated 
> PyTypeObjects to heap types.

What is the status of immortality?  None of these seem forbidding on
first sight, so long that we can get the state everywhere.
Having immortal object seems convenient, but probably not particularly
necessary.

Most of our state is currently in static variables in functions
(usually filled in dynamically at first call).  That is very convenient
since it doesn't require a global list anywhere.

I suppose moving it to module-state may well require a global list (or
is there a nice other pattern?).  But while tedious, it doesn't seem
problematic.

Switching to heap types should not be a big deal I suspect.

> 
> - find a mechanism to access the per-interpreter module state
> 

One thing that I am not clear about are e.g. creation functions.  They
are public C-API so they have no way of getting a "self" or type/module
passed in.  How will such a function get the module state?

Now, we could likely replace those functions in the long run (or even
just remove many).  But it seems to me that we may need a
`PyType_GetModuleByDef()` that is passed _only_ the `module_def`?


> - carefully consider places in the code that we steal references
> either 
> intentionally or because that is the CPython C-API we are using
> 

This is an issue for HPy that needs to be cleared up, although I am
wondering how important it is for subinterpreters as such?


> - measure the performance implications of the necessary changes
> 
> - plan forward/backward compatibility
> 


One other thing I am not quite sure about right now is GIL grabbing. 
`PyGILState_Ensure()` will continue to work reliably?
This used to be one of my main worries.  It is also something we can
fix-up (pass through additional information), but where a fallback
seems needed.

Cheers,

Sebastian



> 
> This seems like a significant undertaking, and is why we have
> rejected 
> casual calls for supporting multiple interpreters in the past [2],
> [3], 
> [4]. Supporting multiple interpreters is currently not on the NumPy 
> roadmap [0]. Priorities can be changed, through dialog with the NumPy
> community, and others can propose changes to NumPy via NEPs, PRs, and
> issues, but we are unlikely to engage directly in the work if it is
> not 
> an agreed upon goal. There are other initiatives around NumPy that
> may 
> dovetail with multiple interpreters. For instance the HPy group hit
> many 
> of the issues above when creating a  port of NumPy [5]. It would be
> good 
> to get like-minded people talking about this and to pool resources, 
> maybe someone on this list has a strong opinion and would be willing
> to 
> put in some work on the subject.
> 
> 
> One thing CPython could do is to provide clear documentation how to
> port 
> a small c-extension module [1]
> 
> 
> Matti
> 
> 
> [0] https://numpy.org/neps/roadmap.html
> 
> [1] https://github.com/python/cpython/issues/79601
> 
> [2] https://github.com/numpy/numpy/issues/665
> 
> [3] https://github.com/numpy/numpy/issues/14384
> 
> [4] https://github.com/numpy/numpy/issues/16963
> 
> [5]  
> https://github.com/hpyproject/numpy-hpy/tree/graal-team/hpy#readme
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/

[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-23 Thread Petr Viktorin




On 23. 08. 22 10:02, Matti Picus wrote:


On 23/8/22 03:16, Matti Picus wrote:


...

One thing CPython could do is to provide clear documentation how to 
port a small c-extension module [1]



Matti
[1] https://github.com/python/cpython/issues/79601



I should have searched the documentation, there is now a quite extensive 
guide [2] including all the different interfaces provided for getting 
per-interpreter module state.


Nothing to apologize about, it is only in the docs for the unreleased 
3.11 :)
I'd be happy to answer questions and clarify things. Please let me know 
if the written text lets you down.




Matti


[2] https://docs.python.org/3.11/howto/isolating-extensions.html

[3] https://docs.python.org/3.11/c-api/type.html#c.PyType_GetModuleState

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: encu...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy

2022-08-23 Thread Matti Picus



On 23/8/22 03:16, Matti Picus wrote:


...

One thing CPython could do is to provide clear documentation how to 
port a small c-extension module [1]



Matti
[1] https://github.com/python/cpython/issues/79601



I should have searched the documentation, there is now a quite extensive 
guide [2] including all the different interfaces provided for getting 
per-interpreter module state.



Matti


[2] https://docs.python.org/3.11/howto/isolating-extensions.html

[3] https://docs.python.org/3.11/c-api/type.html#c.PyType_GetModuleState

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com