Re: [Numpy-discussion] Good use of __dunder__ methods in numpy

2020-03-24 Thread Abdur-Rahmaan Janhangeer
Thanks for info!

Kind Regards,

Abdur-Rahmaan Janhangeer
compileralchemy.com  | github

Mauritius


On Mon, Mar 23, 2020 at 10:56 PM Chris Barker  wrote:

> On Thu, Mar 5, 2020 at 2:15 PM Gregory Lee  wrote:
>
>> If i can get a link to a file that shows how dunder methods help with
>>> having cool coding APIs that would be great!
>>>
>>>
>> You may want to take a look at PEP 465 as an example, then. If I recall
>> correctly, the __matmul__ method described in it was added to the standard
>> library largely with NumPy in mind.
>> https://www.python.org/dev/peps/pep-0465/
>>
>
> and so were "rich comparisons", and in-place operators (at least in part).
>
> numpy is VERY, VERY, heavily built on the concept of overloading
> operators, i.e. using dunders or magic methods.
>
> I'm going to venture a guess that numpy arrays custom define every single
> standard dunder -- and certainly most of them.
>
> -CHB
>
>
>
>
>
>> On Thu, Mar 5, 2020 at 10:32 PM Sebastian Berg <
>>> sebast...@sipsolutions.net> wrote:
>>>
 Hi,

 On Thu, 2020-03-05 at 11:14 +0400, Abdur-Rahmaan Janhangeer wrote:
 > Greetings list,
 >
 > I have a talk about dunder methods in Python
 >
 > (
 >
 https://conference.mscc.mu/speaker/67604187-57c3-4be6-987c-ea4bef388ad3
 > )
 >
 > and it would be nice to include Numpy in the mix. Can someone point
 > me to one or two use cases / file link where dunder methods help
 > numpy?
 >

 I am not sure in what sense you are looking for. NumPy has its own set
 of dunder methods (some of which should not be used super much
 probably), like `__array__`, `__array_interface__`, `__array_ufunc__`,
 `__array_function__`, `__array_finalize__`, ...
 So we are using `__array_*__` for numpy related dunders.

 Of course we use most Python defined dunders, but I am not sure that
 you are looking for that?

 Best,

 Sebastian


 > Thanks
 >
 > fun info: i am a tiny numpy contributor with a one line merge.
 > ___
 > NumPy-Discussion mailing list
 > NumPy-Discussion@python.org
 > https://mail.python.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@python.org
 https://mail.python.org/mailman/listinfo/numpy-discussion

>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

2020-03-24 Thread Francesc Alted
On Mon, Mar 23, 2020 at 9:49 PM Sebastian Berg 
wrote:

> On Mon, 2020-03-23 at 18:23 +0100, Francesc Alted wrote:
> 
> > > If we were designing a new programming language around array
> > > computing
> > > principles, I do think that would be the approach I would want to
> > > take/consider. But I simply lack the vision of how marrying the
> > > idea
> > > with the scalar language Python would work out well...
> > >
> >
> > I have had a glance at what you are after, and it seems challenging
> > indeed.  IMO, trying to cope with everybody's need regarding data
> > types is
> > extremely costly (or more simply, not even possible).  I think that a
> > better approach would be to decouple the storage part of a container
> > from
> > its data type system.  In the storage there should go things needed
> > to cope
> > with the data retrieval, like the itemsize, the shape, or even other
> > sub-shapes for chunked datasets.  Then, in the data type layer, one
> > should
> > be able to add meaning to the raw data: is that an integer?  speed?
> > temperature?  a compound type?
> >
>
> I am struggling a bit fully understand the lessons to learn.
>
> There seems some overlap of storage and DTypes? That is mainly
> `itemsize` and more tricky `is/has_object`. Which is about how the data
> is stored but depends on which data is stored?
> In my current view these are part of the `dtype` instance, e.g. the
> class `np.dtype[np.string]` (a DTypeMeta instance), will have
> instances: `np.dtype[np.string](length=5, byteorder="=")` (which is
> identical to `np.dtype("U5")`).


> Or is it that `np.ndarray` would actually use an `np.naivearray`
> internally, which is told the itemsize at construction time?
> In principle, the DType class could also be much more basic, and NumPy
> could subclass it (or something similar) to tag on the things it needs
> to efficiently use the DTypes (outside the computational engine/UFuncs,
> which cover a lot, but unfortunately I do not think everything).
>

What I am trying to say is that NumPy should be rather agnostic about
providing data types beyond the relatively simple set that already
supports.  I am suggesting that focusing on providing a way to allow the
storage (not only in-memory, but also persisted arrays via .npy/.npz files)
of user-defined data types (or any other kind of metadata)  and let 3rd
party libraries use this machinery to serialize/deserialize them might be a
better use of resources.

I am envisioning making life easier for libraries like e.g. xarray, which
already extends NumPy in a number of ways, and that can make use of
computational kernels different than NumPy itself (dask, probably numba
too) in order to implement functionality not present in NumPy.  Allowing an
easy way to serialize library-defined data types would open the door to use
NumPy itself as a storage layer for persistency too, bringing an important
complement to NetCDF or zarr formats (remember that every format comes with
its own pros and cons).

But xarray is just an example; why not thinking on other kind of libraries
that would provide their own types, leveraging NumPy for storage and e.g.
numba for building a library of efficient functions, specific for the new
types?  If done properly, these datasets can still be shared efficiently
with other libraries, as long as the basic data type system existing in
NumPy is used to access to it.

Cheers,
Francesc





>
> - Sebastian
>
>
> > Indeed the data storage layer should be able to provide a way to
> > store the
> > data type representation so that a container can be serialized and
> > deserialized correctly.  But the important thing here is that this
> > decoupling between storage and types allows for different data type
> > systems, so that anyone can come with a specific type system
> > depending on
> > her needs.  One can envision here even a basic data type system (e.g.
> > a
> > version of what's now supported in NumPy) that can be extended with
> > other
> > layers, depending on the needs, so that every community can
> > interchange
> > data at a basic level at least.
> >
> > As an example, this is the basic spirit behind the under-construction
> > Caterva array container (https://github.com/Blosc/Caterva).  Blosc2 (
> > https://github.com/Blosc/C-Blosc2) will be providing the low-level
> > storage
> > layer, with no information about dimensionality.  Caterva will be
> > building
> > the multidimensional layer, but with no information about the types
> > at
> > all.  On top of this scaffold, third-party layers will be free to
> > build
> > their own data dtypes, specific for every domain (the concept is
> > imaged in
> > slide 18 of this presentation:
> > https://blosc.org/docs/Caterva-HDF5-Workshop.pdf).  There is nothing
> > to
> > prevent to add more layers, or even a two-layer (and preferably no
> > more
> > than two-level) data type system: one for simple data types (e.g.
> > NumPy
> > ones) and one meant to be more domain-specific.
> >
> > R

Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

2020-03-24 Thread Matti Picus


On 24/3/20 11:48 am, Francesc Alted wrote:


What I am trying to say is that NumPy should be rather agnostic about 
providing data types beyond the relatively simple set that already 
supports.  I am suggesting that focusing on providing a way to allow 
the storage (not only in-memory, but also persisted arrays via 
.npy/.npz files) of user-defined data types (or any other kind of 
metadata)  and let 3rd party libraries use this machinery to 
serialize/deserialize them might be a better use of resources.


...
Cheers,
Francesc

I agree that the goal is to enable user-defined data types, and even 
make the creation of them from python possible (with some caveats about 
performance). But I think this should be done in steps, and as the 
subject line says this is the first step. There are many scary details 
to work out around the problems of promotion and casting, what to do 
when the output might overflow, how to mark missing values and more. The 
question at hand is, as I understand it, one of finding the right way to 
create a data type object that will enable exactly what you propose. I 
think this is the correct path, as most large refactor-in-one-step 
efforts I have seem leave both the old code and the new code in an 
unusable state for years until the bugs are worked out.



As for serialization protocols: I think that is a separate issue. We 
already have the npy/npz protocol, PEP3118 buffer protocol, and the 
pickle 5 buffering protocol. Each of them handle user-defined data types 
in different ways, with differing amounts of success.



Matti

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

2020-03-24 Thread Francesc Alted
On Tue, Mar 24, 2020 at 12:12 PM Matti Picus  wrote:

>
> On 24/3/20 11:48 am, Francesc Alted wrote:
> >
> > What I am trying to say is that NumPy should be rather agnostic about
> > providing data types beyond the relatively simple set that already
> > supports.  I am suggesting that focusing on providing a way to allow
> > the storage (not only in-memory, but also persisted arrays via
> > .npy/.npz files) of user-defined data types (or any other kind of
> > metadata)  and let 3rd party libraries use this machinery to
> > serialize/deserialize them might be a better use of resources.
> >
> > ...
> > Cheers,
> > Francesc
> >
> I agree that the goal is to enable user-defined data types, and even
> make the creation of them from python possible (with some caveats about
> performance). But I think this should be done in steps, and as the
> subject line says this is the first step. There are many scary details
> to work out around the problems of promotion and casting, what to do
> when the output might overflow, how to mark missing values and more. The
> question at hand is, as I understand it, one of finding the right way to
> create a data type object that will enable exactly what you propose. I
> think this is the correct path, as most large refactor-in-one-step
> efforts I have seem leave both the old code and the new code in an
> unusable state for years until the bugs are worked out.
>

Thanks Matti for clarifying the goals of the NEP; having the sentence "New
Datatype System" in the title sounded scary to my ears indeed, and I share
your concerns about new code largely undergoing 'beta' stage for long time.
Before shutting up, I'll just reiterate that providing pretty shallow
machinery for allowing the integration with user-defined data types should
avoid big headaches: the simpler, the better.  But this is of course up to
the maintainers.


>
> As for serialization protocols: I think that is a separate issue. We
> already have the npy/npz protocol, PEP3118 buffer protocol, and the
> pickle 5 buffering protocol. Each of them handle user-defined data types
> in different ways, with differing amounts of success.
>

Yup, I forgot the buffer protocol an pickle 5.  Thanks for reminder.

Cheers,
-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Put type annotations in NumPy proper?

2020-03-24 Thread Stephan Hoyer
When we started numpy-stubs [1] a few years ago, putting type annotations
in NumPy itself seemed premature. We still supported Python 2, which meant
that we would need to use awkward comments for type annotations.

Over the past few years, using type annotations has become increasingly
popular, even in the scientific Python stack. For example, off-hand I know
that at least SciPy, pandas and xarray have at least part of their APIs
type annotated. Even without annotations for shapes or dtypes, it would be
valuable to have near complete annotations for NumPy, the project at the
bottom of the scientific stack.

Unfortunately, numpy-stubs never really took off. I can think of a few
reasons for that:
1. Missing high level guidance on how to write type annotations,
particularly for how (or if) to annotate particularly dynamic parts of
NumPy (e.g., consider __array_function__), and whether we should prioritize
strictness or faithfulness [2].
2. We didn't have a good experience for new contributors. Due to the
relatively low level of interest in the project, when a contributor would
occasionally drop in, I often didn't even notice their PR for a few weeks.
3. Developing type annotations separately from the main codebase makes them
a little harder to keep in sync. This means that type annotations couldn't
serve their typical purpose of self-documenting code. Part of this may be
necessary for NumPy (due to our use of C extensions), but large parts of
NumPy's user facing APIs are written in Python. We no longer support Python
2, so at least we no longer need to worry about putting annotations in
comments.

We eventually could probably use a formal NEP (or several) on how we want
to use type annotations in NumPy, but I think a good first step would be to
think about how to start moving the annotations from numpy-stubs into numpy
proper.

Any thoughts? Anyone interested in taking the lead on this?

Cheers,
Stephan

[1] https://github.com/numpy/numpy-stubs
[2] https://github.com/numpy/numpy-stubs/issues/12
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Put type annotations in NumPy proper?

2020-03-24 Thread Roman Yurchak
Thanks for re-starting this discussion, Stephan! I think there is 
definitely significant interest in this topic: 
https://github.com/numpy/numpy/issues/7370 is the issue with the largest 
number of user likes in the issue tracker (FWIW).


Having them in numpy, as opposed to a separate numpy-stubs repository 
would indeed be ideal from a user perspective. When looking into it in 
the past, I was never sure how well in sync numpy-stubs was. Putting 
aside ndarray, as more challenging, even annotations for numpy functions 
and method parameters with built-in types would help, as a start.


To add to the previously listed projects that would benefit from this, 
we are currently considering to start using some (minimal) type 
annotations in scikit-learn.


--
Roman Yurchak

On 24/03/2020 18:00, Stephan Hoyer wrote:
When we started numpy-stubs [1] a few years ago, putting type 
annotations in NumPy itself seemed premature. We still supported Python 
2, which meant that we would need to use awkward comments for type 
annotations.


Over the past few years, using type annotations has become increasingly 
popular, even in the scientific Python stack. For example, off-hand I 
know that at least SciPy, pandas and xarray have at least part of their 
APIs type annotated. Even without annotations for shapes or dtypes, it 
would be valuable to have near complete annotations for NumPy, the 
project at the bottom of the scientific stack.


Unfortunately, numpy-stubs never really took off. I can think of a few 
reasons for that:
1. Missing high level guidance on how to write type annotations, 
particularly for how (or if) to annotate particularly dynamic parts of 
NumPy (e.g., consider __array_function__), and whether we should 
prioritize strictness or faithfulness [2].
2. We didn't have a good experience for new contributors. Due to the 
relatively low level of interest in the project, when a contributor 
would occasionally drop in, I often didn't even notice their PR for a 
few weeks.
3. Developing type annotations separately from the main codebase makes 
them a little harder to keep in sync. This means that type annotations 
couldn't serve their typical purpose of self-documenting code. Part of 
this may be necessary for NumPy (due to our use of C extensions), but 
large parts of NumPy's user facing APIs are written in Python. We no 
longer support Python 2, so at least we no longer need to worry about 
putting annotations in comments.


We eventually could probably use a formal NEP (or several) on how we 
want to use type annotations in NumPy, but I think a good first step 
would be to think about how to start moving the annotations from 
numpy-stubs into numpy proper.


Any thoughts? Anyone interested in taking the lead on this?

Cheers,
Stephan

[1] https://github.com/numpy/numpy-stubs
[2] https://github.com/numpy/numpy-stubs/issues/12

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy doesn't use RAM

2020-03-24 Thread Keyvis Damptey
Hi Numpy dev community,

I'm keyvis, a statistical data scientist.

I'm currently using numpy in python 3.8.2 64-bit for a clustering problem,
on a machine with 1.9 TB RAM. When I try using np.zeros to create a 600,000
by 600,000 matrix of dtype=np.float32 it says
"Unable to allocate 1.31 TiB for an array with shape (60, 60) and
data type float32"

I used psutils to determine how much RAM python thinks it has access to and
it return with 1.8 TB approx.

Is there some way I can fix numpy to create these large arrays?
Thanks for your time and consideration
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Put type annotations in NumPy proper?

2020-03-24 Thread Eric Wieser
>  Putting
> aside ndarray, as more challenging, even annotations for numpy functions
> and method parameters with built-in types would help, as a start.

This is a good idea in principle, but one thing concerns me.

If we add type annotations to numpy, does it become an error to have
numpy-stubs installed?
That is, is this an all-or-nothing thing where as soon as we start,
numpy-stubs becomes unusable?

Eric

On Tue, 24 Mar 2020 at 17:28, Roman Yurchak  wrote:

> Thanks for re-starting this discussion, Stephan! I think there is
> definitely significant interest in this topic:
> https://github.com/numpy/numpy/issues/7370 is the issue with the largest
> number of user likes in the issue tracker (FWIW).
>
> Having them in numpy, as opposed to a separate numpy-stubs repository
> would indeed be ideal from a user perspective. When looking into it in
> the past, I was never sure how well in sync numpy-stubs was. Putting
> aside ndarray, as more challenging, even annotations for numpy functions
> and method parameters with built-in types would help, as a start.
>
> To add to the previously listed projects that would benefit from this,
> we are currently considering to start using some (minimal) type
> annotations in scikit-learn.
>
> --
> Roman Yurchak
>
> On 24/03/2020 18:00, Stephan Hoyer wrote:
> > When we started numpy-stubs [1] a few years ago, putting type
> > annotations in NumPy itself seemed premature. We still supported Python
> > 2, which meant that we would need to use awkward comments for type
> > annotations.
> >
> > Over the past few years, using type annotations has become increasingly
> > popular, even in the scientific Python stack. For example, off-hand I
> > know that at least SciPy, pandas and xarray have at least part of their
> > APIs type annotated. Even without annotations for shapes or dtypes, it
> > would be valuable to have near complete annotations for NumPy, the
> > project at the bottom of the scientific stack.
> >
> > Unfortunately, numpy-stubs never really took off. I can think of a few
> > reasons for that:
> > 1. Missing high level guidance on how to write type annotations,
> > particularly for how (or if) to annotate particularly dynamic parts of
> > NumPy (e.g., consider __array_function__), and whether we should
> > prioritize strictness or faithfulness [2].
> > 2. We didn't have a good experience for new contributors. Due to the
> > relatively low level of interest in the project, when a contributor
> > would occasionally drop in, I often didn't even notice their PR for a
> > few weeks.
> > 3. Developing type annotations separately from the main codebase makes
> > them a little harder to keep in sync. This means that type annotations
> > couldn't serve their typical purpose of self-documenting code. Part of
> > this may be necessary for NumPy (due to our use of C extensions), but
> > large parts of NumPy's user facing APIs are written in Python. We no
> > longer support Python 2, so at least we no longer need to worry about
> > putting annotations in comments.
> >
> > We eventually could probably use a formal NEP (or several) on how we
> > want to use type annotations in NumPy, but I think a good first step
> > would be to think about how to start moving the annotations from
> > numpy-stubs into numpy proper.
> >
> > Any thoughts? Anyone interested in taking the lead on this?
> >
> > Cheers,
> > Stephan
> >
> > [1] https://github.com/numpy/numpy-stubs
> > [2] https://github.com/numpy/numpy-stubs/issues/12
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy doesn't use RAM

2020-03-24 Thread Sebastian Berg
On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> Hi Numpy dev community,
> 
> I'm keyvis, a statistical data scientist.
> 
> I'm currently using numpy in python 3.8.2 64-bit for a clustering
> problem,
> on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> 600,000
> by 600,000 matrix of dtype=np.float32 it says
> "Unable to allocate 1.31 TiB for an array with shape (60, 60)
> and
> 
> data type float32"
> 

If this error happens, allocating the memory failed. This should be
pretty much a simple `malloc` call in C, so this is the kernel
complaining, not Python/NumPy.

I am not quite sure, but maybe memory fragmentation plays its part, or
simply are actually out of memory for that process, 1.44TB is a
significant portion of the total memory after all.

Not sure what to say, but I think you should probably look into other
solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
sure the last actually helps). It will be tricky to work with arrays of
a size that is close to the available total memory.

Maybe someone who works more with such data here can give you tips on
what projects can help you or what solutions to look into.

- Sebastian



> I used psutils to determine how much RAM python thinks it has access
> to and
> it return with 1.8 TB approx.
> 
> Is there some way I can fix numpy to create these large arrays?
> Thanks for your time and consideration
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy doesn't use RAM

2020-03-24 Thread Stanley Seibert
In addition to what Sebastian said about memory fragmentation and OS limits
about memory allocations, I do think it will be hard to work with an array
that close to the memory limit in NumPy regardless.  Almost any operation
will need to make a temporary array and exceed your memory limit.  You
might want to look at Dask Array for a NumPy-like API for working with
chunked arrays that can be staged in and out of memory:

https://docs.dask.org/en/latest/array.html

As a bonus, Dask will also let you make better use of the large number of
CPU cores that you likely have in your 1.9 TB RAM system.  :)

On Tue, Mar 24, 2020 at 1:00 PM Keyvis Damptey 
wrote:

> Hi Numpy dev community,
>
> I'm keyvis, a statistical data scientist.
>
> I'm currently using numpy in python 3.8.2 64-bit for a clustering problem,
> on a machine with 1.9 TB RAM. When I try using np.zeros to create a 600,000
> by 600,000 matrix of dtype=np.float32 it says
> "Unable to allocate 1.31 TiB for an array with shape (60, 60) and
> data type float32"
>
> I used psutils to determine how much RAM python thinks it has access to
> and it return with 1.8 TB approx.
>
> Is there some way I can fix numpy to create these large arrays?
> Thanks for your time and consideration
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy doesn't use RAM

2020-03-24 Thread Benjamin Root
Another thing to point out about having an array of that percentage of the
available memory is that it severely restricts what you can do with it.
Since you are above 50% of the available memory, you won't be able to
create another array that would be the result of computing something with
that array. So, you are restricted to querying (which you could do without
having everything in-memory), or in-place operations.

Dask arrays might be what you are really looking for.

Ben Root

On Tue, Mar 24, 2020 at 2:18 PM Sebastian Berg 
wrote:

> On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> > Hi Numpy dev community,
> >
> > I'm keyvis, a statistical data scientist.
> >
> > I'm currently using numpy in python 3.8.2 64-bit for a clustering
> > problem,
> > on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> > 600,000
> > by 600,000 matrix of dtype=np.float32 it says
> > "Unable to allocate 1.31 TiB for an array with shape (60, 60)
> > and
> >
> > data type float32"
> >
>
> If this error happens, allocating the memory failed. This should be
> pretty much a simple `malloc` call in C, so this is the kernel
> complaining, not Python/NumPy.
>
> I am not quite sure, but maybe memory fragmentation plays its part, or
> simply are actually out of memory for that process, 1.44TB is a
> significant portion of the total memory after all.
>
> Not sure what to say, but I think you should probably look into other
> solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
> sure the last actually helps). It will be tricky to work with arrays of
> a size that is close to the available total memory.
>
> Maybe someone who works more with such data here can give you tips on
> what projects can help you or what solutions to look into.
>
> - Sebastian
>
>
>
> > I used psutils to determine how much RAM python thinks it has access
> > to and
> > it return with 1.8 TB approx.
> >
> > Is there some way I can fix numpy to create these large arrays?
> > Thanks for your time and consideration
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Put type annotations in NumPy proper?

2020-03-24 Thread Joshua Wilson
> That is, is this an all-or-nothing thing where as soon as we start, 
> numpy-stubs becomes unusable?

Until NumPy is made PEP 561 compatible by adding a `py.typed` file,
type checkers will ignore the types in the repo, so in theory you can
avoid the all or nothing. In practice it's maybe trickier because
currently people can use the stubs, but they won't be able to use the
types in the repo until the PEP 561 switch is flipped. So e.g.
currently SciPy pulls the stubs from `numpy-stubs` master, allowing
for a short

find place where NumPy stubs are lacking -> improve stubs -> improve SciPy types

loop. If all development moves into the main repo then SciPy is
blocked on it becoming PEP 561 compatible before moving forward. But,
you could complain that I put the cart before the horse with
introducing typing in the SciPy repo before the NumPy types were more
resolved, and that's probably a fair complaint.

> Anyone interested in taking the lead on this?

Not that I am a core developer or anything, but I am interested in
helping to improve typing in NumPy.

On Tue, Mar 24, 2020 at 11:15 AM Eric Wieser
 wrote:
>
> >  Putting
> > aside ndarray, as more challenging, even annotations for numpy functions
> > and method parameters with built-in types would help, as a start.
>
> This is a good idea in principle, but one thing concerns me.
>
> If we add type annotations to numpy, does it become an error to have 
> numpy-stubs installed?
> That is, is this an all-or-nothing thing where as soon as we start, 
> numpy-stubs becomes unusable?
>
> Eric
>
> On Tue, 24 Mar 2020 at 17:28, Roman Yurchak  wrote:
>>
>> Thanks for re-starting this discussion, Stephan! I think there is
>> definitely significant interest in this topic:
>> https://github.com/numpy/numpy/issues/7370 is the issue with the largest
>> number of user likes in the issue tracker (FWIW).
>>
>> Having them in numpy, as opposed to a separate numpy-stubs repository
>> would indeed be ideal from a user perspective. When looking into it in
>> the past, I was never sure how well in sync numpy-stubs was. Putting
>> aside ndarray, as more challenging, even annotations for numpy functions
>> and method parameters with built-in types would help, as a start.
>>
>> To add to the previously listed projects that would benefit from this,
>> we are currently considering to start using some (minimal) type
>> annotations in scikit-learn.
>>
>> --
>> Roman Yurchak
>>
>> On 24/03/2020 18:00, Stephan Hoyer wrote:
>> > When we started numpy-stubs [1] a few years ago, putting type
>> > annotations in NumPy itself seemed premature. We still supported Python
>> > 2, which meant that we would need to use awkward comments for type
>> > annotations.
>> >
>> > Over the past few years, using type annotations has become increasingly
>> > popular, even in the scientific Python stack. For example, off-hand I
>> > know that at least SciPy, pandas and xarray have at least part of their
>> > APIs type annotated. Even without annotations for shapes or dtypes, it
>> > would be valuable to have near complete annotations for NumPy, the
>> > project at the bottom of the scientific stack.
>> >
>> > Unfortunately, numpy-stubs never really took off. I can think of a few
>> > reasons for that:
>> > 1. Missing high level guidance on how to write type annotations,
>> > particularly for how (or if) to annotate particularly dynamic parts of
>> > NumPy (e.g., consider __array_function__), and whether we should
>> > prioritize strictness or faithfulness [2].
>> > 2. We didn't have a good experience for new contributors. Due to the
>> > relatively low level of interest in the project, when a contributor
>> > would occasionally drop in, I often didn't even notice their PR for a
>> > few weeks.
>> > 3. Developing type annotations separately from the main codebase makes
>> > them a little harder to keep in sync. This means that type annotations
>> > couldn't serve their typical purpose of self-documenting code. Part of
>> > this may be necessary for NumPy (due to our use of C extensions), but
>> > large parts of NumPy's user facing APIs are written in Python. We no
>> > longer support Python 2, so at least we no longer need to worry about
>> > putting annotations in comments.
>> >
>> > We eventually could probably use a formal NEP (or several) on how we
>> > want to use type annotations in NumPy, but I think a good first step
>> > would be to think about how to start moving the annotations from
>> > numpy-stubs into numpy proper.
>> >
>> > Any thoughts? Anyone interested in taking the lead on this?
>> >
>> > Cheers,
>> > Stephan
>> >
>> > [1] https://github.com/numpy/numpy-stubs
>> > [2] https://github.com/numpy/numpy-stubs/issues/12
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>>
>> ___
>> NumPy-Discussion mailing list
>

Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 162, Issue 27

2020-03-24 Thread Keyvis Damptey
Thanks. This is s embarressing, but I wasn't able to create a new
matrix because I forgot to delete the original massive matrix. I was
testing how big it could go in terms of rows/columns before reaching the
limit and forgot to delete the last object before creating a new one.
 Sadly that data usage was not reflected in the task manager for the VM
instance.

On Tue, Mar 24, 2020, 6:44 PM  wrote:

> Send NumPy-Discussion mailing list submissions to
> numpy-discussion@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> numpy-discussion-requ...@python.org
>
> You can reach the person managing the list at
> numpy-discussion-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>1. Re: Numpy doesn't use RAM (Sebastian Berg)
>2. Re: Numpy doesn't use RAM (Stanley Seibert)
>3. Re: Numpy doesn't use RAM (Benjamin Root)
>4. Re: Put type annotations in NumPy proper? (Joshua Wilson)
>
>
> --
>
> Message: 1
> Date: Tue, 24 Mar 2020 13:15:47 -0500
> From: Sebastian Berg 
> To: numpy-discussion@python.org
> Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
> Message-ID:
> 
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> > Hi Numpy dev community,
> >
> > I'm keyvis, a statistical data scientist.
> >
> > I'm currently using numpy in python 3.8.2 64-bit for a clustering
> > problem,
> > on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> > 600,000
> > by 600,000 matrix of dtype=np.float32 it says
> > "Unable to allocate 1.31 TiB for an array with shape (60, 60)
> > and
> >
> > data type float32"
> >
>
> If this error happens, allocating the memory failed. This should be
> pretty much a simple `malloc` call in C, so this is the kernel
> complaining, not Python/NumPy.
>
> I am not quite sure, but maybe memory fragmentation plays its part, or
> simply are actually out of memory for that process, 1.44TB is a
> significant portion of the total memory after all.
>
> Not sure what to say, but I think you should probably look into other
> solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
> sure the last actually helps). It will be tricky to work with arrays of
> a size that is close to the available total memory.
>
> Maybe someone who works more with such data here can give you tips on
> what projects can help you or what solutions to look into.
>
> - Sebastian
>
>
>
> > I used psutils to determine how much RAM python thinks it has access
> > to and
> > it return with 1.8 TB approx.
> >
> > Is there some way I can fix numpy to create these large arrays?
> > Thanks for your time and consideration
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> -- next part --
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 833 bytes
> Desc: This is a digitally signed message part
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/16501583/attachment-0001.sig
> >
>
> --
>
> Message: 2
> Date: Tue, 24 Mar 2020 13:35:49 -0500
> From: Stanley Seibert 
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
> Message-ID:
> <
> cadv3rktjbo48a+eyjn7m+gpt2iasd8esaihtzs0vqlnuny_...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> In addition to what Sebastian said about memory fragmentation and OS limits
> about memory allocations, I do think it will be hard to work with an array
> that close to the memory limit in NumPy regardless.  Almost any operation
> will need to make a temporary array and exceed your memory limit.  You
> might want to look at Dask Array for a NumPy-like API for working with
> chunked arrays that can be staged in and out of memory:
>
> https://docs.dask.org/en/latest/array.html
>
> As a bonus, Dask will also let you make better use of the large number of
> CPU cores that you likely have in your 1.9 TB RA

[Numpy-discussion] NumPy Development Meeting - Triage Focus

2020-03-24 Thread Sebastian Berg
Hi all,

Our bi-weekly triage-focused NumPy development meeting is tomorrow
(Wednesday, March 25) at 11 am Pacific Time. Everyone is invited
to join in and edit the work-in-progress meeting topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel should
be prioritized or simply discussed briefly. Just comment on it so we
can label it, or add your PR/issue to this weeks topics for discussion.

Best regards

Sebastian




signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Put type annotations in NumPy proper?

2020-03-24 Thread Juan Nunez-Iglesias
I'd like to offer a +1 from skimage's perspective (and napari's!) for having 
NumPy types directly in the repo. We have been wanting to start using type 
annotations, but the lack of types in NumPy proper, together with the 
uncertainty about whether numpy-stubs was "officially supported" and in-sync 
with NumPy itself, have been factors in holding us back.

The __array_function__ problem is a major source of confusion for those of us 
unfamiliar with typing theory. Protocols would seem to solve this but they were 
only recently accepted and are only available by default in Python 3.8+:

https://www.python.org/dev/peps/pep-0544/
https://mypy.readthedocs.io/en/stable/protocols.html

I'd like to avoid driving this discussion off-topic, so it would be great if 
there was a "typing interest group" or similar list where we could discuss 
these issues. One thing that we would love to do in skimage is distinguish 
different kinds of arrays using NewType, e.g.

Image = NewType('Image', np.ndarray)
Coordinates = NewType('Coordinates', np.ndarray)

def find_maxima(image : Image) -> Coordinates:
...

def gaussian_filter(image : Image, sigma : float) -> Image:
...

then

find_maxima(gaussian_filter(image, num))

validates, but

gaussian_filter(find_maxima(image), num)

does not, even though the arguments are all NumPy arrays. However, the above 
cannot be combined with an array Protocol:

https://www.python.org/dev/peps/pep-0544/#newtype-and-type-aliases

I'd love to understand the reasons why, and whether this decision can be 
reversed, but I am out of my depth. Again, this is probably something for a 
different thread, but I just wanted to flag it as something to discuss as we 
develop a typing framework to suit the entire SciPy ecosystem. Also if someone 
has resources for "type theory for beginners", both in Python and more 
generally, they would be appreciated!

Thanks Stephan for raising this issue!

Juan.

On Tue, 24 Mar 2020, at 5:42 PM, Joshua Wilson wrote:
> > That is, is this an all-or-nothing thing where as soon as we start, 
> > numpy-stubs becomes unusable?
> 
> Until NumPy is made PEP 561 compatible by adding a `py.typed` file,
> type checkers will ignore the types in the repo, so in theory you can
> avoid the all or nothing. In practice it's maybe trickier because
> currently people can use the stubs, but they won't be able to use the
> types in the repo until the PEP 561 switch is flipped. So e.g.
> currently SciPy pulls the stubs from `numpy-stubs` master, allowing
> for a short
> 
> find place where NumPy stubs are lacking -> improve stubs -> improve SciPy 
> types
> 
> loop. If all development moves into the main repo then SciPy is
> blocked on it becoming PEP 561 compatible before moving forward. But,
> you could complain that I put the cart before the horse with
> introducing typing in the SciPy repo before the NumPy types were more
> resolved, and that's probably a fair complaint.
> 
> > Anyone interested in taking the lead on this?
> 
> Not that I am a core developer or anything, but I am interested in
> helping to improve typing in NumPy.
> 
> On Tue, Mar 24, 2020 at 11:15 AM Eric Wieser
>  wrote:
> >
> > >  Putting
> > > aside ndarray, as more challenging, even annotations for numpy functions
> > > and method parameters with built-in types would help, as a start.
> >
> > This is a good idea in principle, but one thing concerns me.
> >
> > If we add type annotations to numpy, does it become an error to have 
> > numpy-stubs installed?
> > That is, is this an all-or-nothing thing where as soon as we start, 
> > numpy-stubs becomes unusable?
> >
> > Eric
> >
> > On Tue, 24 Mar 2020 at 17:28, Roman Yurchak  wrote:
> >>
> >> Thanks for re-starting this discussion, Stephan! I think there is
> >> definitely significant interest in this topic:
> >> https://github.com/numpy/numpy/issues/7370 is the issue with the largest
> >> number of user likes in the issue tracker (FWIW).
> >>
> >> Having them in numpy, as opposed to a separate numpy-stubs repository
> >> would indeed be ideal from a user perspective. When looking into it in
> >> the past, I was never sure how well in sync numpy-stubs was. Putting
> >> aside ndarray, as more challenging, even annotations for numpy functions
> >> and method parameters with built-in types would help, as a start.
> >>
> >> To add to the previously listed projects that would benefit from this,
> >> we are currently considering to start using some (minimal) type
> >> annotations in scikit-learn.
> >>
> >> --
> >> Roman Yurchak
> >>
> >> On 24/03/2020 18:00, Stephan Hoyer wrote:
> >> > When we started numpy-stubs [1] a few years ago, putting type
> >> > annotations in NumPy itself seemed premature. We still supported Python
> >> > 2, which meant that we would need to use awkward comments for type
> >> > annotations.
> >> >
> >> > Over the past few years, using type annotations has become increasingly
> >> > popular, even in the sci

Re: [Numpy-discussion] Numpy doesn't use RAM

2020-03-24 Thread YueCompl
An alternative solution may be 
https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html 


If you are sure your subsequent computation against the array data has enough 
locality to avoid thrashing, I think numpy.memmap would work for you, i.e. to 
use an explicit disk file serving as swap.

My env does a lot mmap'ing on disk data files by C++ (after Python read meta 
data), then wrap as ndarray, that's enough to run out-of-core programs as long 
as data access patterns fit in physical RAM at any instant, then even scanning 
the whole dataset is okay along the time axis (in realworld not data).

Memory (address space) fragmentation is a problem, besides OS' `nofile` (number 
of file handles held open) limitation, if too many small data files involved, 
we are in switching to a solution with FUSE based fs with virtual large file 
viewing many small files on remote storage server.

Cheers,
Compl

> On 2020-03-25, at 02:35, Stanley Seibert  wrote:
> 
> In addition to what Sebastian said about memory fragmentation and OS limits 
> about memory allocations, I do think it will be hard to work with an array 
> that close to the memory limit in NumPy regardless.  Almost any operation 
> will need to make a temporary array and exceed your memory limit.  You might 
> want to look at Dask Array for a NumPy-like API for working with chunked 
> arrays that can be staged in and out of memory:
> 
> https://docs.dask.org/en/latest/array.html 
> 
> 
> As a bonus, Dask will also let you make better use of the large number of CPU 
> cores that you likely have in your 1.9 TB RAM system.  :)
> 
> On Tue, Mar 24, 2020 at 1:00 PM Keyvis Damptey  > wrote:
> Hi Numpy dev community,
> 
> I'm keyvis, a statistical data scientist.
> 
> I'm currently using numpy in python 3.8.2 64-bit for a clustering problem, on 
> a machine with 1.9 TB RAM. When I try using np.zeros to create a 600,000 by 
> 600,000 matrix of dtype=np.float32 it says
> "Unable to allocate 1.31 TiB for an array with shape (60, 60) and 
> data type float32"
> 
> I used psutils to determine how much RAM python thinks it has access to and 
> it return with 1.8 TB approx.
> 
> Is there some way I can fix numpy to create these large arrays?
> Thanks for your time and consideration
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org 
> https://mail.python.org/mailman/listinfo/numpy-discussion 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion