[Numpy-discussion] Re: writing a known-size 1D ndarray serially as it's calced
Hi all, I‘ve made the Pip/Conda module npy-append-array for exactly this purpose, see https://github.com/xor2k/npy-append-array It works with one dimensional arrays, too, of course. The key challange is to properly initialize and update the header accordingly as the array grows which my module takes care of. I‘d like to integrate this functionality directly into Numpy, see PR https://github.com/numpy/numpy/pull/20321/ but I have been busy and did have not received any feedback recently. A more direct integration into Numpy would allow to skip or ease the header update part, e.g. by introducing a new file format version. This could turn .npy into a sort of binary CSV equivalent where the size of the array is determined by the file size. Best, Michael > On 24. Aug 2022, at 03:04, Robert Kern wrote: > > On Tue, Aug 23, 2022 at 8:47 PM wrote: >> I want to calc multiple ndarrays at once and lack memory, so want to write >> in chunks (here sized to GPU batch capacity). It seems there should be an >> interface to write the header, then write a number of elements cyclically, >> then add any closing rubric and close the file. >> >> Is it as simple as lib.format.write_array_header_2_0(fp, d) >> then writing multiple shape(N,) arrays of float by fp.write(item.tobytes())? > > `item.tofile(fp)` is more efficient, but yes, that's the basic scheme. There > is no footer after the data. > > The alternative is to use `np.lib.format.open_memmap(filename, mode='w+', > dtype=dtype, shape=shape)`, then assign slices sequentially to the returned > memory-mapped array. A memory-mapped array is usually going to be friendlier > to whatever memory limits you are running into than a nominally "in-memory" > array. > > -- > Robert Kern > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: michael.sieber...@gmail.com ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: writing a known-size 1D ndarray serially as it's calced
On Tue, Aug 23, 2022 at 8:47 PM wrote: > I want to calc multiple ndarrays at once and lack memory, so want to write > in chunks (here sized to GPU batch capacity). It seems there should be an > interface to write the header, then write a number of elements cyclically, > then add any closing rubric and close the file. > > Is it as simple as lib.format.write_array_header_2_0(fp, d) > then writing multiple shape(N,) arrays of float by > fp.write(item.tobytes())? > `item.tofile(fp)` is more efficient, but yes, that's the basic scheme. There is no footer after the data. The alternative is to use `np.lib.format.open_memmap(filename, mode='w+', dtype=dtype, shape=shape)`, then assign slices sequentially to the returned memory-mapped array. A memory-mapped array is usually going to be friendlier to whatever memory limits you are running into than a nominally "in-memory" array. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] writing a known-size 1D ndarray serially as it's calced
I want to calc multiple ndarrays at once and lack memory, so want to write in chunks (here sized to GPU batch capacity). It seems there should be an interface to write the header, then write a number of elements cyclically, then add any closing rubric and close the file. Is it as simple as lib.format.write_array_header_2_0(fp, d) then writing multiple shape(N,) arrays of float by fp.write(item.tobytes())? ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] ENH: ndarray.__format__ implementation for numeric dtypes
We are looking for more feedback on this PR: https://github.com/numpy/numpy/pull/19550 If you would like to contribute to the discussion, please leave a comment in the Conversation section of the PR. -- Cheers, Inessa Inessa Pawson Contributor Experience Lead | NumPy https://numpy.org/ GitHub: inessapawson ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] next NumPy Newcomers' Hour
Our next Newcomers' Hour will be held this Thursday, August 25th, at 2 pm UTC. Stop by to ask questions or just to say hi. To add to the meeting agenda the topics you’d like to discuss, follow the link: https://hackmd.io/3f3otyyuTte3FU9y3QzsLg?both Join the meeting via Zoom: https://us02web.zoom.us/j/87192457898 Cheers, Inessa Inessa Pawson Contributor Experience Lead | NumPy https://numpy.org/ GitHub: inessapawson ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy
On Tue, 2022-08-23 at 14:00 +0200, Petr Viktorin wrote: > On 23. 08. 22 11:46, Sebastian Berg wrote: > > On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote: > > > > > > On 22/8/22 18:59, Eric Snow wrote: > > > > Hi all, > > > > > > > > > > > > > > devs than just me. Do you have any preference for or against > > > > any > > > > particular venue? > > > > > > > > Thanks! > > > > > > > > -eric > > > > ___ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to > > > > numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: matti.pi...@gmail.com > > > > > > Thanks for starting the conversation. I would personally prefer > > > the > > > discussion about NumPy be here, general discussions could be > > > elsewhere. > > > > > > > > > Please correct me if I am wrong: I understand that multiple > > > interpreters > > > would require us to (at least): > > > > > > These days, I was somewhat hoping that the HPy effort might give us > > subinterpreters without having two seperate efforts going on at the > > same time. Since much of the refactors are probably identical > > between > > the two and it seemed some significant effort might go into that. > > > > But of course starting with subinterpreter support without HPy > > probably > > also helps the HPy effort. > > Both should help each other. > > > > - refactor all the static module global state in NumPy and make > > > it > > > re-entrant or immortal including converting stack-allocated > > > PyTypeObjects to heap types. > > > > What is the status of immortality? None of these seem forbidding > > on > > first sight, so long that we can get the state everywhere. > > Having immortal object seems convenient, but probably not > > particularly > > necessary. > > > > Most of our state is currently in static variables in functions > > (usually filled in dynamically at first call). That is very > > convenient > > since it doesn't require a global list anywhere. > > > > I suppose moving it to module-state may well require a global list > > (or > > is there a nice other pattern?). But while tedious, it doesn't > > seem > > problematic. > > A struct for the module state is the state of the art, yes. > > > Switching to heap types should not be a big deal I suspect. > > > > > > > > - find a mechanism to access the per-interpreter module state > > > > > > > One thing that I am not clear about are e.g. creation functions. > > They > > are public C-API so they have no way of getting a "self" or > > type/module > > passed in. How will such a function get the module state? > > > > Now, we could likely replace those functions in the long run (or > > even > > just remove many). But it seems to me that we may need a > > `PyType_GetModuleByDef()` that is passed _only_ the `module_def`? > > Then you're looking at per-interpreter state, or thread-locals. > That's > problematic, e.g. you now need to handle clean-up at interpreter > shutdown, and the that isn't well supported. (Or leak -- AFAIK that's > what NumPy currently does when Python's single interpreter is > finalized?) > I do urge you to assume that there can be multiple isolated NumPy > modules created from a single def, even in a single interpreter. It's > an > additional constraint, but since it's conceptually simple I do think > it > makes up for itself in regularity/maintainability/reviewability. > > And if the CPython API is lacking, it would be best to solve that in > CPython. > The issue is that we have public C-API that will be lacking the necessary information. Maybe pretty deep API (I am not certain). Now that I think about it, even things like the type is unclear to me. `_Type` would not be per interpreter (unless we figure out immortality). But it exists as public API just like `Py_None`, etc.? Our public C-API is currently exported as a single static struct into the library loading NumPy. If types depend on the interpreter, it would seem we need to redo the whole mechanism? Further, many of the functions would need to be adapted. We might be able to hack that the API looks the same [1]. However, it cannot be ABI compatible, so we would need a whole new API table/export mechnism and some sort of shim to allow compiling against older NumPy versions but using it with all versions (otherwise we need 2+ years of patience). Of course there might be a point in saying that most C-API use is initially not subinterpreter ready, but it does seem like a pretty huge limitation... Cheers, Sebastian [1] I.e. smuggle in module state without the library importing the NumPy C-API having to change its code. > > > - carefully consider places in the code that we steal references > > > either > > > intentionally or because that is the CPython C-API we are using > > > > > > > This is an issue for HPy that needs
[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy
On 23. 08. 22 11:46, Sebastian Berg wrote: On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote: On 22/8/22 18:59, Eric Snow wrote: Hi all, devs than just me. Do you have any preference for or against any particular venue? Thanks! -eric ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: matti.pi...@gmail.com Thanks for starting the conversation. I would personally prefer the discussion about NumPy be here, general discussions could be elsewhere. Please correct me if I am wrong: I understand that multiple interpreters would require us to (at least): These days, I was somewhat hoping that the HPy effort might give us subinterpreters without having two seperate efforts going on at the same time. Since much of the refactors are probably identical between the two and it seemed some significant effort might go into that. But of course starting with subinterpreter support without HPy probably also helps the HPy effort. Both should help each other. - refactor all the static module global state in NumPy and make it re-entrant or immortal including converting stack-allocated PyTypeObjects to heap types. What is the status of immortality? None of these seem forbidding on first sight, so long that we can get the state everywhere. Having immortal object seems convenient, but probably not particularly necessary. Most of our state is currently in static variables in functions (usually filled in dynamically at first call). That is very convenient since it doesn't require a global list anywhere. I suppose moving it to module-state may well require a global list (or is there a nice other pattern?). But while tedious, it doesn't seem problematic. A struct for the module state is the state of the art, yes. Switching to heap types should not be a big deal I suspect. - find a mechanism to access the per-interpreter module state One thing that I am not clear about are e.g. creation functions. They are public C-API so they have no way of getting a "self" or type/module passed in. How will such a function get the module state? Now, we could likely replace those functions in the long run (or even just remove many). But it seems to me that we may need a `PyType_GetModuleByDef()` that is passed _only_ the `module_def`? Then you're looking at per-interpreter state, or thread-locals. That's problematic, e.g. you now need to handle clean-up at interpreter shutdown, and the that isn't well supported. (Or leak -- AFAIK that's what NumPy currently does when Python's single interpreter is finalized?) I do urge you to assume that there can be multiple isolated NumPy modules created from a single def, even in a single interpreter. It's an additional constraint, but since it's conceptually simple I do think it makes up for itself in regularity/maintainability/reviewability. And if the CPython API is lacking, it would be best to solve that in CPython. - carefully consider places in the code that we steal references either intentionally or because that is the CPython C-API we are using This is an issue for HPy that needs to be cleared up, although I am wondering how important it is for subinterpreters as such? Not important. Borrowed references work mainly to enable optimized collections that don't store full PyObjects -- currently that's HPy territory. If you find the C API forcing you to steal references, I do want to eventually fix that in CPython to make switching to HPy easy (and eventually to enable the optimizations in CPython). A lot of “better” alternative APIs was actually added in recent versions, and I'd welcome requests for what to prioritize for Python 3.12+. - measure the performance implications of the necessary changes - plan forward/backward compatibility One other thing I am not quite sure about right now is GIL grabbing. `PyGILState_Ensure()` will continue to work reliably? This used to be one of my main worries. It is also something we can fix-up (pass through additional information), but where a fallback seems needed. Per-interpreter GIL is an *additional* step. I believe it will need its own opt-in mechanism. But subinterpreter support is a prerequisite for it. So yes, PyGILState_Ensure will still acquire a global lock for you. Cheers, Sebastian This seems like a significant undertaking, and is why we have rejected casual calls for supporting multiple interpreters in the past [2], [3], [4]. Supporting multiple interpreters is currently not on the NumPy roadmap [0]. Priorities can be changed, through dialog with the NumPy community, and others can propose changes to NumPy via NEPs, PRs, and issues, but we are unlikely to engage directly in the work if it is not an agreed upon goal. There are other initiatives around NumPy
[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy
On Tue, 2022-08-23 at 03:16 +0300, Matti Picus wrote: > > On 22/8/22 18:59, Eric Snow wrote: > > Hi all, > > > > devs than just me. Do you have any preference for or against any > > particular venue? > > > > Thanks! > > > > -eric > > ___ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: matti.pi...@gmail.com > > Thanks for starting the conversation. I would personally prefer the > discussion about NumPy be here, general discussions could be > elsewhere. > > > Please correct me if I am wrong: I understand that multiple > interpreters > would require us to (at least): These days, I was somewhat hoping that the HPy effort might give us subinterpreters without having two seperate efforts going on at the same time. Since much of the refactors are probably identical between the two and it seemed some significant effort might go into that. But of course starting with subinterpreter support without HPy probably also helps the HPy effort. > > - refactor all the static module global state in NumPy and make it > re-entrant or immortal including converting stack-allocated > PyTypeObjects to heap types. What is the status of immortality? None of these seem forbidding on first sight, so long that we can get the state everywhere. Having immortal object seems convenient, but probably not particularly necessary. Most of our state is currently in static variables in functions (usually filled in dynamically at first call). That is very convenient since it doesn't require a global list anywhere. I suppose moving it to module-state may well require a global list (or is there a nice other pattern?). But while tedious, it doesn't seem problematic. Switching to heap types should not be a big deal I suspect. > > - find a mechanism to access the per-interpreter module state > One thing that I am not clear about are e.g. creation functions. They are public C-API so they have no way of getting a "self" or type/module passed in. How will such a function get the module state? Now, we could likely replace those functions in the long run (or even just remove many). But it seems to me that we may need a `PyType_GetModuleByDef()` that is passed _only_ the `module_def`? > - carefully consider places in the code that we steal references > either > intentionally or because that is the CPython C-API we are using > This is an issue for HPy that needs to be cleared up, although I am wondering how important it is for subinterpreters as such? > - measure the performance implications of the necessary changes > > - plan forward/backward compatibility > One other thing I am not quite sure about right now is GIL grabbing. `PyGILState_Ensure()` will continue to work reliably? This used to be one of my main worries. It is also something we can fix-up (pass through additional information), but where a fallback seems needed. Cheers, Sebastian > > This seems like a significant undertaking, and is why we have > rejected > casual calls for supporting multiple interpreters in the past [2], > [3], > [4]. Supporting multiple interpreters is currently not on the NumPy > roadmap [0]. Priorities can be changed, through dialog with the NumPy > community, and others can propose changes to NumPy via NEPs, PRs, and > issues, but we are unlikely to engage directly in the work if it is > not > an agreed upon goal. There are other initiatives around NumPy that > may > dovetail with multiple interpreters. For instance the HPy group hit > many > of the issues above when creating a port of NumPy [5]. It would be > good > to get like-minded people talking about this and to pool resources, > maybe someone on this list has a strong opinion and would be willing > to > put in some work on the subject. > > > One thing CPython could do is to provide clear documentation how to > port > a small c-extension module [1] > > > Matti > > > [0] https://numpy.org/neps/roadmap.html > > [1] https://github.com/python/cpython/issues/79601 > > [2] https://github.com/numpy/numpy/issues/665 > > [3] https://github.com/numpy/numpy/issues/14384 > > [4] https://github.com/numpy/numpy/issues/16963 > > [5] > https://github.com/hpyproject/numpy-hpy/tree/graal-team/hpy#readme > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy
On 23. 08. 22 10:02, Matti Picus wrote: On 23/8/22 03:16, Matti Picus wrote: ... One thing CPython could do is to provide clear documentation how to port a small c-extension module [1] Matti [1] https://github.com/python/cpython/issues/79601 I should have searched the documentation, there is now a quite extensive guide [2] including all the different interfaces provided for getting per-interpreter module state. Nothing to apologize about, it is only in the docs for the unreleased 3.11 :) I'd be happy to answer questions and clarify things. Please let me know if the written text lets you down. Matti [2] https://docs.python.org/3.11/howto/isolating-extensions.html [3] https://docs.python.org/3.11/c-api/type.html#c.PyType_GetModuleState ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: encu...@gmail.com ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Support for Multiple Interpreters (Subinterpreters) in numpy
On 23/8/22 03:16, Matti Picus wrote: ... One thing CPython could do is to provide clear documentation how to port a small c-extension module [1] Matti [1] https://github.com/python/cpython/issues/79601 I should have searched the documentation, there is now a quite extensive guide [2] including all the different interfaces provided for getting per-interpreter module state. Matti [2] https://docs.python.org/3.11/howto/isolating-extensions.html [3] https://docs.python.org/3.11/c-api/type.html#c.PyType_GetModuleState ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com