Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-12 Thread Juan Nunez-Iglesias
Hey Nathaniel,

Fascinating! Thanks for the primer! I didn't know that it would check dtype
of values in the whole array. In that case, I would agree that it would be
bad to infer it magically from just the first value, and this can be left
to the users.

Thanks!

Juan.

On Sat, Dec 12, 2015 at 7:00 PM, Nathaniel Smith  wrote:

> On Fri, Dec 11, 2015 at 11:32 PM, Juan Nunez-Iglesias
>  wrote:
> > Nathaniel,
> >
> >> IMO this is better than making np.array(iter) internally call list(iter)
> >> or equivalent
> >
> > Yeah but that's not the only option:
> >
> > from itertools import chain
> > def fromiter_awesome_edition(iterable):
> > elem = next(iterable)
> > dtype = whatever_numpy_does_to_infer_dtypes_from_lists(elem)
> > return np.fromiter(chain([elem], iterable), dtype=dtype)
> >
> > I think this would be a huge win for usability. Always getting tripped
> up by
> > the dtype requirement. I can submit a PR if people like this pattern.
>
> This isn't the semantics of np.array, though -- np.array will look at
> the whole input and try to find a common dtype, so this can't be the
> implementation for np.array(iter). E.g. try np.array([1, 1.0])
>
> I can see an argument for making the dtype= argument to fromiter
> optional, with a warning in the docs that it will guess based on the
> first element and that you should specify it if you don't want that.
> It seems potentially a bit error prone (in the sense that it might
> make it easier to end up with code that works great when you test it
> but then breaks later when something unexpected happens), but maybe
> the usability outweighs that. I don't use fromiter myself so I don't
> have a strong opinion.
>
> > btw, I think np.array(['f', 'o', 'o']) would be exactly the expected
> result
> > for np.array('foo'), but I guess that's just me.
>
> In general np.array(thing_that_can_go_inside_an_array) returns a
> zero-dimensional (scalar) array -- np.array(1), np.array(True), etc.
> all work like this, so I'd expect np.array("foo") to do the same.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory mapping and NPZ files

2015-12-12 Thread Nathaniel Smith
On Dec 12, 2015 10:53 AM, "Mathieu Dubois" 
wrote:
>
> Le 11/12/2015 11:22, Sturla Molden a écrit :
>>
>> Mathieu Dubois  wrote:
>>
>>> The point is precisely that, you can't do memory mapping with Npz files
>>> (while it works with Npy files).
>>
>> The operating system can memory map any file. But as npz-files are
>> compressed, you will need to uncompress the contents in your memory
mapping
>> to make sense of it.
>
> We agree on that. The goal is to be able to create a np.memmap array from
an Npz file.
>
>
>> I would suggest you use PyTables instead of npz-files.
>> It allows on the fly compression and uncompression (via blosc) and will
>> probably do what you want.
>
> Yes I know I can use other solutions. The point is that np.load silently
ignore the mmap option so I wanted to discuss ways to improve this.

I can see a good argument for transitioning to a rule where mmap=False
doesn't mmap, mmap=True mmaps if the file is uncompressed and raises an
error for compressed files, and mmap="if-possible" gives the current
behavior.

(It's even possible that the current code would already accept
"if-possible" as a alias for True, which would make the transition easier.)

Or maybe "never"/"always"/"if-possible" would be better for type
consistency reasons, while deprecating the use of bools altogether. But
this transition might be a bit more of a hassle, since these definitely
won't work on older numpy's.

Silently creating a massive temporary file doesn't seem like a great idea
to me in any case. Creating a temporary file + mmaping it is essentially
equivalent to just loading the data into swappable RAM, except that the
swap case is guaranteed not to accidentally leave a massive temp file lying
around afterwards.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array

2015-12-12 Thread Peter Creasey
> >
> > from itertools import chain
> > def fromiter_awesome_edition(iterable):
> > elem = next(iterable)
> > dtype = whatever_numpy_does_to_infer_dtypes_from_lists(elem)
> > return np.fromiter(chain([elem], iterable), dtype=dtype)
> >
> > I think this would be a huge win for usability. Always getting tripped up by
> > the dtype requirement. I can submit a PR if people like this pattern.
>
> This isn't the semantics of np.array, though -- np.array will look at
> the whole input and try to find a common dtype, so this can't be the
> implementation for np.array(iter). E.g. try np.array([1, 1.0])
>
> I can see an argument for making the dtype= argument to fromiter
> optional, with a warning in the docs that it will guess based on the
> first element and that you should specify it if you don't want that.
> It seems potentially a bit error prone (in the sense that it might
> make it easier to end up with code that works great when you test it
> but then breaks later when something unexpected happens), but maybe
> the usability outweighs that. I don't use fromiter myself so I don't
> have a strong opinion.

I’m -1 on this, from an occasional user of np.fromiter, also for the
np.fromiter([1, 1.5, 2]) ambiguity reason. Pure python does a great
job of preventing users from hurting themselves with limited precision
arithmetic, however if their application makes them care enough about
speed (to be using numpy) and memory (to be using np.fromiter), then
it can almost always be assumed that the resulting dtype was important
enough to be specified.

P
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-12 Thread Anne Archibald
On Fri, Dec 11, 2015, 18:04 David Cournapeau  wrote:

On Fri, Dec 11, 2015 at 4:22 PM, Anne Archibald  wrote:

Actually, GCC implements 128-bit floats in software and provides them as
__float128; there are also quad-precision versions of the usual functions.
The Intel compiler provides this as well, I think, but I don't think
Microsoft compilers do. A portable quad-precision library might be less
painful.

The cleanest way to add extended precision to numpy is by adding a
C-implemented dtype. This can be done in an extension module; see the
quaternion and half-precision modules online.

We actually used __float128 dtype as an example of how to create a custom
dtype for a numpy C tutorial we did w/ Stefan Van der Walt a few years ago
at SciPy.

IIRC, one of the issue to make it more than a PoC was that numpy hardcoded
things like long double being the higest precision, etc... But that may has
been fixed since then.

I did some work on numpy's long-double support, partly to better understand
what would be needed to make quads work. The main obstacle is, I think, the
same: python floats are only 64-bit, and many functions are stuck passing
through them. It takes a lot of fiddling to make string conversions work
without passing through python floats, for example, and it takes some care
to produce scalars of the appropriate type. There are a few places where
you'd want to modify the guts of numpy if you had a higher precision
available than long doubles.

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-12 Thread Elliot Hallmark
> What does "true vectorization" mean anyway?

Calling python functions on python objects in a for loop is not really
vectorized.  It's much slower than people intend when they use numpy.

Elliot
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-12 Thread Marten van Kerkwijk
Hi All,

astropy `Time` indeed using two doubles internally, but is very limited in
the operations it allows: essentially only addition/subtraction, and
multiplication with/division by a normal double.

It would be great to have better support within numpy; it is a pity to have
a float128 type that does not provide the full associated precision.

All the best,

Marten



On Sat, Dec 12, 2015 at 1:02 PM, Sturla Molden 
wrote:

> "Thomas Baruchel"  wrote:
>
> > While this is obviously the most relevant answer for many users because
> > it will allow them to use Numpy arrays exactly
> > as they would have used them with native types, the wrong thing is that
> > from some point of view "true" vectorization
> > will be lost.
>
> What does "true vectorization" mean anyway?
>
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory mapping and NPZ files

2015-12-12 Thread Mathieu Dubois

Le 11/12/2015 11:22, Sturla Molden a écrit :

Mathieu Dubois  wrote:


The point is precisely that, you can't do memory mapping with Npz files
(while it works with Npy files).

The operating system can memory map any file. But as npz-files are
compressed, you will need to uncompress the contents in your memory mapping
to make sense of it.
We agree on that. The goal is to be able to create a np.memmap array 
from an Npz file.



I would suggest you use PyTables instead of npz-files.
It allows on the fly compression and uncompression (via blosc) and will
probably do what you want.
Yes I know I can use other solutions. The point is that np.load silently 
ignore the mmap option so I wanted to discuss ways to improve this.


Mathieu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-12 Thread Sturla Molden
"Thomas Baruchel"  wrote:

> While this is obviously the most relevant answer for many users because
> it will allow them to use Numpy arrays exactly
> as they would have used them with native types, the wrong thing is that
> from some point of view "true" vectorization
> will be lost.

What does "true vectorization" mean anyway?


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-12 Thread Nathaniel Smith
On Fri, Dec 11, 2015 at 11:32 PM, Juan Nunez-Iglesias
 wrote:
> Nathaniel,
>
>> IMO this is better than making np.array(iter) internally call list(iter)
>> or equivalent
>
> Yeah but that's not the only option:
>
> from itertools import chain
> def fromiter_awesome_edition(iterable):
> elem = next(iterable)
> dtype = whatever_numpy_does_to_infer_dtypes_from_lists(elem)
> return np.fromiter(chain([elem], iterable), dtype=dtype)
>
> I think this would be a huge win for usability. Always getting tripped up by
> the dtype requirement. I can submit a PR if people like this pattern.

This isn't the semantics of np.array, though -- np.array will look at
the whole input and try to find a common dtype, so this can't be the
implementation for np.array(iter). E.g. try np.array([1, 1.0])

I can see an argument for making the dtype= argument to fromiter
optional, with a warning in the docs that it will guess based on the
first element and that you should specify it if you don't want that.
It seems potentially a bit error prone (in the sense that it might
make it easier to end up with code that works great when you test it
but then breaks later when something unexpected happens), but maybe
the usability outweighs that. I don't use fromiter myself so I don't
have a strong opinion.

> btw, I think np.array(['f', 'o', 'o']) would be exactly the expected result
> for np.array('foo'), but I guess that's just me.

In general np.array(thing_that_can_go_inside_an_array) returns a
zero-dimensional (scalar) array -- np.array(1), np.array(True), etc.
all work like this, so I'd expect np.array("foo") to do the same.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion