Re: [Numpy-discussion] Memory mapping and NPZ files

2015-12-11 Thread Sturla Molden
Mathieu Dubois  wrote:

> The point is precisely that, you can't do memory mapping with Npz files 
> (while it works with Npy files).

The operating system can memory map any file. But as npz-files are
compressed, you will need to uncompress the contents in your memory mapping
to make sense of it. I would suggest you use PyTables instead of npz-files.
It allows on the fly compression and uncompression (via blosc) and will
probably do what you want.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy intermittent seg fault

2015-12-11 Thread Antoine Pitrou

Hi,

On Fri, 11 Dec 2015 10:05:59 +1000
Jacopo Sabbatini  wrote:
> 
> I'm experiencing random segmentation faults from numpy. I have generated a
> core dumped and extracted a stack trace, the following:
> 
> #0  0x7f3a8d921d5d in getenv () from /lib64/libc.so.6
> #1  0x7f3a843bde21 in blas_set_parameter () from
> /opt/apps/sidescananalysis-9.7.1-42-gdd3e068+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
> #2  0x7f3a843bcd91 in blas_memory_alloc () from
> /opt/apps/sidescananalysis-9.7.1-42-gdd3e068+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
> #3  0x7f3a843bd4e5 in blas_thread_server () from
> /opt/apps/sidescananalysis-9.7.1-42-gdd3e068+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
> #4  0x7f3a8e09ff18 in start_thread () from /lib64/libpthread.so.0
> #5  0x7f3a8d9ceb2d in clone () from /lib64/libc.so.6
> 
> I have experience the segfault from several code paths but they all have
> the same stack trace.
> 
> I use conda to run python and numpy. The dump of the packages version is:

In addition to openblas, you should also submit a bug to Anaconda so
that they know of problems with that particular openblas version:
https://github.com/ContinuumIO/anaconda-issues

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread Thomas Baruchel
>From time to time it is asked on forums how to extend precision of computation 
>on Numpy array. The most common answer
given to this question is: use the dtype=object with some arbitrary precision 
module like mpmath or gmpy.
See 
http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
 or http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath or 
http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values

While this is obviously the most relevant answer for many users because it will 
allow them to use Numpy arrays exactly
as they would have used them with native types, the wrong thing is that from 
some point of view "true" vectorization
will be lost.

With years I got very familiar with the extended double-double type which has 
(for usual architectures) about 32 accurate
digits with faster arithmetic than "arbitrary precision types". I even used it 
for research purpose in number theory and
I got convinced that it is a very wonderful type as long as such precision is 
suitable.

I often implemented it partially under Numpy, most of the time by trying to 
vectorize at a low-level the libqd library.

But I recently thought that a very nice and portable way of implementing it 
under Numpy would be to use the existing layer
of vectorization on floats for computing the arithmetic operations by "columns 
containing half of the numbers" rather than
by "full numbers". As a proof of concept I wrote the following file: 
https://gist.github.com/baruchel/c86ed748939534d8910d

I converted and vectorized the Algol 60 codes from 
http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
(Dekker, 1971).

A test is provided at the end; for inverting 100,000 numbers, my type is about 
3 or 4 times faster than GMPY and almost
50 times faster than MPmath. It should be even faster for some other operations 
since I had to create another np.ones
array for testing this type because inversion isn't implemented here (which 
could of course be done). You can run this file by yourself
(maybe you will have to discard mpmath or gmpy if you don't have it).

I would like to discuss about the way to make available something related to 
that.

a) Would it be relevant to include that in Numpy ? (I would think to some 
"contribution"-tool rather than including it in
the core of Numpy because it would be painful to code all ufuncs; on the other 
hand I am pretty sure that many would be happy
to perform several arithmetic operations by knowing that they can't use 
cos/sin/etc. on this type; in other words, I am not
sure it would be a good idea to embed it as an every-day type but I think it 
would be nice to have it quickly available
in some way). If you agree with that, in which way should I code it (the 
current link only is a "proof of concept"; I would
be very happy to code it in some cleaner way)?

b) Do you think such attempt should remain something external to Numpy itself 
and be released on my Github account without being
integrated to Numpy?

Best regards,

-- 
Thomas Baruchel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread josef.pktd
On Fri, Dec 11, 2015 at 11:22 AM, Anne Archibald  wrote:
> Actually, GCC implements 128-bit floats in software and provides them as
> __float128; there are also quad-precision versions of the usual functions.
> The Intel compiler provides this as well, I think, but I don't think
> Microsoft compilers do. A portable quad-precision library might be less
> painful.
>
> The cleanest way to add extended precision to numpy is by adding a
> C-implemented dtype. This can be done in an extension module; see the
> quaternion and half-precision modules online.
>
> Anne
>
>
> On Fri, Dec 11, 2015, 16:46 Charles R Harris 
> wrote:
>>
>> On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel  wrote:
>>>
>>> From time to time it is asked on forums how to extend precision of
>>> computation on Numpy array. The most common answer
>>> given to this question is: use the dtype=object with some arbitrary
>>> precision module like mpmath or gmpy.
>>> See
>>> http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
>>> or http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
>>> or
>>> http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
>>>
>>> While this is obviously the most relevant answer for many users because
>>> it will allow them to use Numpy arrays exactly
>>> as they would have used them with native types, the wrong thing is that
>>> from some point of view "true" vectorization
>>> will be lost.
>>>
>>> With years I got very familiar with the extended double-double type which
>>> has (for usual architectures) about 32 accurate
>>> digits with faster arithmetic than "arbitrary precision types". I even
>>> used it for research purpose in number theory and
>>> I got convinced that it is a very wonderful type as long as such
>>> precision is suitable.
>>>
>>> I often implemented it partially under Numpy, most of the time by trying
>>> to vectorize at a low-level the libqd library.
>>>
>>> But I recently thought that a very nice and portable way of implementing
>>> it under Numpy would be to use the existing layer
>>> of vectorization on floats for computing the arithmetic operations by
>>> "columns containing half of the numbers" rather than
>>> by "full numbers". As a proof of concept I wrote the following file:
>>> https://gist.github.com/baruchel/c86ed748939534d8910d
>>>
>>> I converted and vectorized the Algol 60 codes from
>>> http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
>>> (Dekker, 1971).
>>>
>>> A test is provided at the end; for inverting 100,000 numbers, my type is
>>> about 3 or 4 times faster than GMPY and almost
>>> 50 times faster than MPmath. It should be even faster for some other
>>> operations since I had to create another np.ones
>>> array for testing this type because inversion isn't implemented here
>>> (which could of course be done). You can run this file by yourself
>>> (maybe you will have to discard mpmath or gmpy if you don't have it).
>>>
>>> I would like to discuss about the way to make available something related
>>> to that.
>>>
>>> a) Would it be relevant to include that in Numpy ? (I would think to some
>>> "contribution"-tool rather than including it in
>>> the core of Numpy because it would be painful to code all ufuncs; on the
>>> other hand I am pretty sure that many would be happy
>>> to perform several arithmetic operations by knowing that they can't use
>>> cos/sin/etc. on this type; in other words, I am not
>>> sure it would be a good idea to embed it as an every-day type but I think
>>> it would be nice to have it quickly available
>>> in some way). If you agree with that, in which way should I code it (the
>>> current link only is a "proof of concept"; I would
>>> be very happy to code it in some cleaner way)?
>>>
>>> b) Do you think such attempt should remain something external to Numpy
>>> itself and be released on my Github account without being
>>> integrated to Numpy?
>>
>>
>> I think astropy does something similar for time and dates. There has also
>> been some talk of adding a user type for ieee 128 bit doubles. I've looked
>> once for relevant code for the latter and, IIRC, the available packages were
>> GPL :(.

This might be the same as or similar to a recent announcement for Julia

https://groups.google.com/d/msg/julia-users/iHTaxRVj1yM/M-WtZCedCQAJ


It would be useful to get this in a consistent way across platforms
and compilers.
I can think of several applications where higher precision reduce
operations would be
useful in statistics.
As Windows user, I never even saw a higher precision float.

Josef


>>
>> Chuck
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> 

Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread Anne Archibald
Actually, GCC implements 128-bit floats in software and provides them as
__float128; there are also quad-precision versions of the usual functions.
The Intel compiler provides this as well, I think, but I don't think
Microsoft compilers do. A portable quad-precision library might be less
painful.

The cleanest way to add extended precision to numpy is by adding a
C-implemented dtype. This can be done in an extension module; see the
quaternion and half-precision modules online.

Anne

On Fri, Dec 11, 2015, 16:46 Charles R Harris 
wrote:

> On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel  wrote:
>
>> From time to time it is asked on forums how to extend precision of
>> computation on Numpy array. The most common answer
>> given to this question is: use the dtype=object with some arbitrary
>> precision module like mpmath or gmpy.
>> See
>> http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
>> or
>> http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
>> or
>> http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
>>
>> While this is obviously the most relevant answer for many users because
>> it will allow them to use Numpy arrays exactly
>> as they would have used them with native types, the wrong thing is that
>> from some point of view "true" vectorization
>> will be lost.
>>
>> With years I got very familiar with the extended double-double type which
>> has (for usual architectures) about 32 accurate
>> digits with faster arithmetic than "arbitrary precision types". I even
>> used it for research purpose in number theory and
>> I got convinced that it is a very wonderful type as long as such
>> precision is suitable.
>>
>> I often implemented it partially under Numpy, most of the time by trying
>> to vectorize at a low-level the libqd library.
>>
>> But I recently thought that a very nice and portable way of implementing
>> it under Numpy would be to use the existing layer
>> of vectorization on floats for computing the arithmetic operations by
>> "columns containing half of the numbers" rather than
>> by "full numbers". As a proof of concept I wrote the following file:
>> https://gist.github.com/baruchel/c86ed748939534d8910d
>>
>> I converted and vectorized the Algol 60 codes from
>> http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
>> (Dekker, 1971).
>>
>> A test is provided at the end; for inverting 100,000 numbers, my type is
>> about 3 or 4 times faster than GMPY and almost
>> 50 times faster than MPmath. It should be even faster for some other
>> operations since I had to create another np.ones
>> array for testing this type because inversion isn't implemented here
>> (which could of course be done). You can run this file by yourself
>> (maybe you will have to discard mpmath or gmpy if you don't have it).
>>
>> I would like to discuss about the way to make available something related
>> to that.
>>
>> a) Would it be relevant to include that in Numpy ? (I would think to some
>> "contribution"-tool rather than including it in
>> the core of Numpy because it would be painful to code all ufuncs; on the
>> other hand I am pretty sure that many would be happy
>> to perform several arithmetic operations by knowing that they can't use
>> cos/sin/etc. on this type; in other words, I am not
>> sure it would be a good idea to embed it as an every-day type but I think
>> it would be nice to have it quickly available
>> in some way). If you agree with that, in which way should I code it (the
>> current link only is a "proof of concept"; I would
>> be very happy to code it in some cleaner way)?
>>
>> b) Do you think such attempt should remain something external to Numpy
>> itself and be released on my Github account without being
>> integrated to Numpy?
>>
>
> I think astropy does something similar for time and dates. There has also
> been some talk of adding a user type for ieee 128 bit doubles. I've looked
> once for relevant code for the latter and, IIRC, the available packages
> were GPL :(.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread David Cournapeau
On Fri, Dec 11, 2015 at 4:22 PM, Anne Archibald  wrote:

> Actually, GCC implements 128-bit floats in software and provides them as
> __float128; there are also quad-precision versions of the usual functions.
> The Intel compiler provides this as well, I think, but I don't think
> Microsoft compilers do. A portable quad-precision library might be less
> painful.
>
> The cleanest way to add extended precision to numpy is by adding a
> C-implemented dtype. This can be done in an extension module; see the
> quaternion and half-precision modules online.
>

We actually used __float128 dtype as an example of how to create a custom
dtype for a numpy C tutorial we did w/ Stefan Van der Walt a few years ago
at SciPy.

IIRC, one of the issue to make it more than a PoC was that numpy hardcoded
things like long double being the higest precision, etc... But that may has
been fixed since then.

David

> Anne
>
> On Fri, Dec 11, 2015, 16:46 Charles R Harris 
> wrote:
>
>> On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel 
>> wrote:
>>
>>> From time to time it is asked on forums how to extend precision of
>>> computation on Numpy array. The most common answer
>>> given to this question is: use the dtype=object with some arbitrary
>>> precision module like mpmath or gmpy.
>>> See
>>> http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
>>> or
>>> http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
>>> or
>>> http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
>>>
>>> While this is obviously the most relevant answer for many users because
>>> it will allow them to use Numpy arrays exactly
>>> as they would have used them with native types, the wrong thing is that
>>> from some point of view "true" vectorization
>>> will be lost.
>>>
>>> With years I got very familiar with the extended double-double type
>>> which has (for usual architectures) about 32 accurate
>>> digits with faster arithmetic than "arbitrary precision types". I even
>>> used it for research purpose in number theory and
>>> I got convinced that it is a very wonderful type as long as such
>>> precision is suitable.
>>>
>>> I often implemented it partially under Numpy, most of the time by trying
>>> to vectorize at a low-level the libqd library.
>>>
>>> But I recently thought that a very nice and portable way of implementing
>>> it under Numpy would be to use the existing layer
>>> of vectorization on floats for computing the arithmetic operations by
>>> "columns containing half of the numbers" rather than
>>> by "full numbers". As a proof of concept I wrote the following file:
>>> https://gist.github.com/baruchel/c86ed748939534d8910d
>>>
>>> I converted and vectorized the Algol 60 codes from
>>> http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
>>> (Dekker, 1971).
>>>
>>> A test is provided at the end; for inverting 100,000 numbers, my type is
>>> about 3 or 4 times faster than GMPY and almost
>>> 50 times faster than MPmath. It should be even faster for some other
>>> operations since I had to create another np.ones
>>> array for testing this type because inversion isn't implemented here
>>> (which could of course be done). You can run this file by yourself
>>> (maybe you will have to discard mpmath or gmpy if you don't have it).
>>>
>>> I would like to discuss about the way to make available something
>>> related to that.
>>>
>>> a) Would it be relevant to include that in Numpy ? (I would think to
>>> some "contribution"-tool rather than including it in
>>> the core of Numpy because it would be painful to code all ufuncs; on the
>>> other hand I am pretty sure that many would be happy
>>> to perform several arithmetic operations by knowing that they can't use
>>> cos/sin/etc. on this type; in other words, I am not
>>> sure it would be a good idea to embed it as an every-day type but I
>>> think it would be nice to have it quickly available
>>> in some way). If you agree with that, in which way should I code it (the
>>> current link only is a "proof of concept"; I would
>>> be very happy to code it in some cleaner way)?
>>>
>>> b) Do you think such attempt should remain something external to Numpy
>>> itself and be released on my Github account without being
>>> integrated to Numpy?
>>>
>>
>> I think astropy does something similar for time and dates. There has also
>> been some talk of adding a user type for ieee 128 bit doubles. I've looked
>> once for relevant code for the latter and, IIRC, the available packages
>> were GPL :(.
>>
>> Chuck
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread Chris Barker - NOAA Federal
> There has also been some talk of adding a user type for ieee 128 bit doubles. 
> I've looked once for relevant code for the latter and, IIRC, the available 
> packages were GPL :(.


This looks like it's BSD-Ish:

http://www.jhauser.us/arithmetic/SoftFloat.html

Don't know if it's any good

CHB


>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: pyMIC v0.7 Released

2015-12-11 Thread Klemm, Michael
Announcement: pyMIC v0.7
=

I'm happy to announce the release of pyMIC v0.7.

pyMIC is a Python module to offload computation in a Python program to the 
Intel Xeon Phi coprocessor.  It contains offloadable arrays and device 
management functions.  It supports invocation of native kernels (C/C++, 
Fortran) and blends in with Numpy's array types for float, complex, and int 
data types.

For more information and downloads please visit pyMIC's Github page: 
https://github.com/01org/pyMIC.  You can find pyMIC's mailinglist at 
https://lists.01.org/mailman/listinfo/pymic.

Full change log:
=

Version 0.7

* Experimental support for Python 3.
* 'None' arguments of kernels are converted to nullptr or NULL.
* Switched to Python's distutils to build and install pyMIC.
* Deprecated the build system based on Makefiles.

Version 0.6

* Experimental support for the Windows operating system.
* Switched to Cython to generate the glue code for pyMIC.
* Now using Markdown for README and CHANGELOG.
* Introduced PYMIC_DEBUG=3 to trace argument passing for kernels.
* Bugfix: added back the translate_device_pointer() function.
* Bugfix: example SVD now respects order of the passed matrices when applying 
the `dgemm` routine.
* Bugfix: fixed memory leak when invoking kernels.
* Bugfix: fixed broken translation of fake pointers.
* Refactoring: simplified bridge between pyMIC and LIBXSTREAM.

Version 0.5

* Introduced new kernel API that avoids insane pointer unpacking.
* pyMIC now uses libxstreams as the offload back-end
  (https://github.com/hfp/libxstream).
* Added smart pointers to make handling of fake pointers easier.

Version 0.4

* New low-level API to allocate, deallocate, and transfer data
  (see OffloadStream).
* Support for in-place binary operators.
* New internal design to handle offloads.

Version 0.3

* Improved handling of libraries and kernel invocation.
* Trace collection (PYMIC_TRACE=1, PYMIC_TRACE_STACKS={none,compact,full}).
* Replaced the device-centric API with a stream API.
* Refactoring to better match PEP8 recommendations.
* Added support for int(int64) and complex(complex128) data types.
* Reworked the benchmarks and examples to fit the new API.
* Bugfix: fixed syntax errors in OffloadArray.

Version 0.2

* Small improvements to the README files.
* New example: Singular Value Decomposition.
* Some documentation for the API functions.
* Added a basic testsuite for unit testing (WIP).
* Bugfix: benchmarks now use the latest interface.
* Bugfix: numpy.ndarray does not offer an attribute 'order'.
* Bugfix: number_of_devices was not visible after import.
* Bugfix: member offload_array.device is now initialized.
* Bugfix: use exception for errors w/ invoke_kernel & load_library.

Version 0.1

Initial release.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory mapping and NPZ files

2015-12-11 Thread Erik Bray
On Wed, Dec 9, 2015 at 9:51 AM, Mathieu Dubois
 wrote:
> Dear all,
>
> If I am correct, using mmap_mode with Npz files has no effect i.e.:
> f = np.load("data.npz", mmap_mode="r")
> X = f['X']
> will load all the data in memory.
>
> Can somebody confirm that?
>
> If I'm correct, the mmap_mode argument could be passed to the NpzFile class
> which could in turn perform the correct operation. One way to handle that
> would be to use the ZipFile.extract method to write the Npy file on disk and
> then load it with numpy.load with the mmap_mode argument. Note that the user
> will have to remove the file to reclaim disk space (I guess that's OK).
>
> One problem that could arise is that the extracted Npy file can be large
> (it's the purpose of using memory mapping) and therefore it may be useful to
> offer some control on where this file is extracted (for instance /tmp can be
> too small to extract the file here). numpy.load could offer a new option for
> that (passed to ZipFile.extract).

I have struggled for a long time with a similar (albeit more obscure
problem) with PyFITS / astropy.io.fits when it comes to supporting
memory-mapping of compressed FITS files.  For those unaware FITS is a
file format used primarily in Astronomy.

I have all kinds of wacky ideas for optimizing this, but at the moment
when you load data from a compressed FITS file with memory-mapping
enabled, obviously there's not much benefit because the contents of
the file are uncompressed in memory (there is a *little* benefit in
that the compressed data is mmap'd, but the compressed data is
typically much smaller than the uncompressed data).

Currently, in this case, I just issue a warning when the user
explicitly requests mmap=True, but won't get much benefit from it.
Maybe np.load could do the same, but I don't have a strong opinion
about it.  (I only added the warning in PyFITS because a user
requested it and was kind enough to provide a patch--seemed
reasonable).

Erik
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-11 Thread Nathaniel Smith
Constructing an array from an iterator is fundamentally different from
constructing an array from an in-memory data structure like a list,
because in the iterator case it's necessary to either use a
single-pass algorithm or else create extra temporary buffers that
cause much higher memory overhead. (Which is undesirable given that
iterators are mostly used exactly in the case where one wants to
reduce memory overhead.)

np.fromiter requires the dtype= argument because this is necessary if
you want to construct the array in a single pass.

np.array(list(iter)) can avoid the dtype argument, because it creates
that large memory buffer. IMO this is better than making
np.array(iter) internally call list(iter) or equivalent, because the
workaround (adding an explicit call to list()) is trivial, while also
making it obvious to the user what the actual cost of their request
is. (Explicit is better than implicit.)

In addition, the proposed API has a number of infelicities:
- We're generally trying to *reduce* the magic in functions like
np.array (e.g. the discussions of having less magic for lists with
mismatched numbers of elements, or non-list sequences)
- There's a strong convention in Python is when making a function like
np.array generic, it should accept any iter*able* rather any
iter*ator*. But it would be super confusing if np.array({1: 2})
returned array([1]), or if array("foo") returned array(["f", "o",
"o"]), so we don't actually want to handle all iterables the same.
It's somewhat dubious even for iterators (e.g. someone might want to
create an object array containing an iterator...)...

hope that helps,
-n

On Fri, Dec 11, 2015 at 2:27 PM, Stephan Sahm  wrote:
> numpy.fromiter is neither numpy.array nor does it work similar to
> numpy.array(list(...)) as the dtype argument is necessary
>
> is there a reason, why np.array(...) should not work on iterators? I have
> the feeling that such requests get (repeatedly) dismissed, but until yet I
> haven't found a compelling argument for leaving this Feature missing (to
> remember, it is already implemented in a branch)
>
> Please let me know if you know about an argument,
> best,
> Stephan
>
> On 27 November 2015 at 14:18, Alan G Isaac  wrote:
>>
>> On 11/27/2015 5:37 AM, Stephan Sahm wrote:
>>>
>>> I like to request a generator/iterator support for np.array(...) as far
>>> as list(...) supports it.
>>
>>
>>
>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html
>>
>> hth,
>> Alan Isaac
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-11 Thread Stephan Sahm
numpy.fromiter is neither numpy.array nor does it work similar to
numpy.array(list(...)) as the dtype argument is necessary

is there a reason, why np.array(...) should not work on iterators? I have
the feeling that such requests get (repeatedly) dismissed, but until yet I
haven't found a compelling argument for leaving this Feature missing (to
remember, it is already implemented in a branch)

Please let me know if you know about an argument,
best,
Stephan

On 27 November 2015 at 14:18, Alan G Isaac  wrote:

> On 11/27/2015 5:37 AM, Stephan Sahm wrote:
>
>> I like to request a generator/iterator support for np.array(...) as far
>> as list(...) supports it.
>>
>
>
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html
>
> hth,
> Alan Isaac
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-11 Thread Juan Nunez-Iglesias
Nathaniel,

> IMO this is better than making np.array(iter) internally call list(iter)
or equivalent

Yeah but that's not the only option:

from itertools import chain
def fromiter_awesome_edition(iterable):
elem = next(iterable)
dtype = whatever_numpy_does_to_infer_dtypes_from_lists(elem)
return np.fromiter(chain([elem], iterable), dtype=dtype)

I think this would be a huge win for usability. Always getting tripped up
by the dtype requirement. I can submit a PR if people like this pattern.

btw, I think np.array(['f', 'o', 'o']) would be exactly the expected result
for np.array('foo'), but I guess that's just me.

Juan.

On Sat, Dec 12, 2015 at 10:12 AM, Nathaniel Smith  wrote:

> Constructing an array from an iterator is fundamentally different from
> constructing an array from an in-memory data structure like a list,
> because in the iterator case it's necessary to either use a
> single-pass algorithm or else create extra temporary buffers that
> cause much higher memory overhead. (Which is undesirable given that
> iterators are mostly used exactly in the case where one wants to
> reduce memory overhead.)
>
> np.fromiter requires the dtype= argument because this is necessary if
> you want to construct the array in a single pass.
>
> np.array(list(iter)) can avoid the dtype argument, because it creates
> that large memory buffer. IMO this is better than making
> np.array(iter) internally call list(iter) or equivalent, because the
> workaround (adding an explicit call to list()) is trivial, while also
> making it obvious to the user what the actual cost of their request
> is. (Explicit is better than implicit.)
>
> In addition, the proposed API has a number of infelicities:
> - We're generally trying to *reduce* the magic in functions like
> np.array (e.g. the discussions of having less magic for lists with
> mismatched numbers of elements, or non-list sequences)
> - There's a strong convention in Python is when making a function like
> np.array generic, it should accept any iter*able* rather any
> iter*ator*. But it would be super confusing if np.array({1: 2})
> returned array([1]), or if array("foo") returned array(["f", "o",
> "o"]), so we don't actually want to handle all iterables the same.
> It's somewhat dubious even for iterators (e.g. someone might want to
> create an object array containing an iterator...)...
>
> hope that helps,
> -n
>
> On Fri, Dec 11, 2015 at 2:27 PM, Stephan Sahm  wrote:
> > numpy.fromiter is neither numpy.array nor does it work similar to
> > numpy.array(list(...)) as the dtype argument is necessary
> >
> > is there a reason, why np.array(...) should not work on iterators? I have
> > the feeling that such requests get (repeatedly) dismissed, but until yet
> I
> > haven't found a compelling argument for leaving this Feature missing (to
> > remember, it is already implemented in a branch)
> >
> > Please let me know if you know about an argument,
> > best,
> > Stephan
> >
> > On 27 November 2015 at 14:18, Alan G Isaac  wrote:
> >>
> >> On 11/27/2015 5:37 AM, Stephan Sahm wrote:
> >>>
> >>> I like to request a generator/iterator support for np.array(...) as far
> >>> as list(...) supports it.
> >>
> >>
> >>
> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html
> >>
> >> hth,
> >> Alan Isaac
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread Charles R Harris
On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel  wrote:

> From time to time it is asked on forums how to extend precision of
> computation on Numpy array. The most common answer
> given to this question is: use the dtype=object with some arbitrary
> precision module like mpmath or gmpy.
> See
> http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
> or http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
> or
> http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
>
> While this is obviously the most relevant answer for many users because it
> will allow them to use Numpy arrays exactly
> as they would have used them with native types, the wrong thing is that
> from some point of view "true" vectorization
> will be lost.
>
> With years I got very familiar with the extended double-double type which
> has (for usual architectures) about 32 accurate
> digits with faster arithmetic than "arbitrary precision types". I even
> used it for research purpose in number theory and
> I got convinced that it is a very wonderful type as long as such precision
> is suitable.
>
> I often implemented it partially under Numpy, most of the time by trying
> to vectorize at a low-level the libqd library.
>
> But I recently thought that a very nice and portable way of implementing
> it under Numpy would be to use the existing layer
> of vectorization on floats for computing the arithmetic operations by
> "columns containing half of the numbers" rather than
> by "full numbers". As a proof of concept I wrote the following file:
> https://gist.github.com/baruchel/c86ed748939534d8910d
>
> I converted and vectorized the Algol 60 codes from
> http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
> (Dekker, 1971).
>
> A test is provided at the end; for inverting 100,000 numbers, my type is
> about 3 or 4 times faster than GMPY and almost
> 50 times faster than MPmath. It should be even faster for some other
> operations since I had to create another np.ones
> array for testing this type because inversion isn't implemented here
> (which could of course be done). You can run this file by yourself
> (maybe you will have to discard mpmath or gmpy if you don't have it).
>
> I would like to discuss about the way to make available something related
> to that.
>
> a) Would it be relevant to include that in Numpy ? (I would think to some
> "contribution"-tool rather than including it in
> the core of Numpy because it would be painful to code all ufuncs; on the
> other hand I am pretty sure that many would be happy
> to perform several arithmetic operations by knowing that they can't use
> cos/sin/etc. on this type; in other words, I am not
> sure it would be a good idea to embed it as an every-day type but I think
> it would be nice to have it quickly available
> in some way). If you agree with that, in which way should I code it (the
> current link only is a "proof of concept"; I would
> be very happy to code it in some cleaner way)?
>
> b) Do you think such attempt should remain something external to Numpy
> itself and be released on my Github account without being
> integrated to Numpy?
>

I think astropy does something similar for time and dates. There has also
been some talk of adding a user type for ieee 128 bit doubles. I've looked
once for relevant code for the latter and, IIRC, the available packages
were GPL :(.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread Nathaniel Smith
On Dec 11, 2015 7:46 AM, "Charles R Harris" 
wrote:
>
>
>
> On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel  wrote:
>>
>> From time to time it is asked on forums how to extend precision of
computation on Numpy array. The most common answer
>> given to this question is: use the dtype=object with some arbitrary
precision module like mpmath or gmpy.
>> See
http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
or http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
or
http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
>>
>> While this is obviously the most relevant answer for many users because
it will allow them to use Numpy arrays exactly
>> as they would have used them with native types, the wrong thing is that
from some point of view "true" vectorization
>> will be lost.
>>
>> With years I got very familiar with the extended double-double type
which has (for usual architectures) about 32 accurate
>> digits with faster arithmetic than "arbitrary precision types". I even
used it for research purpose in number theory and
>> I got convinced that it is a very wonderful type as long as such
precision is suitable.
>>
>> I often implemented it partially under Numpy, most of the time by trying
to vectorize at a low-level the libqd library.
>>
>> But I recently thought that a very nice and portable way of implementing
it under Numpy would be to use the existing layer
>> of vectorization on floats for computing the arithmetic operations by
"columns containing half of the numbers" rather than
>> by "full numbers". As a proof of concept I wrote the following file:
https://gist.github.com/baruchel/c86ed748939534d8910d
>>
>> I converted and vectorized the Algol 60 codes from
http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
>> (Dekker, 1971).
>>
>> A test is provided at the end; for inverting 100,000 numbers, my type is
about 3 or 4 times faster than GMPY and almost
>> 50 times faster than MPmath. It should be even faster for some other
operations since I had to create another np.ones
>> array for testing this type because inversion isn't implemented here
(which could of course be done). You can run this file by yourself
>> (maybe you will have to discard mpmath or gmpy if you don't have it).
>>
>> I would like to discuss about the way to make available something
related to that.
>>
>> a) Would it be relevant to include that in Numpy ? (I would think to
some "contribution"-tool rather than including it in
>> the core of Numpy because it would be painful to code all ufuncs; on the
other hand I am pretty sure that many would be happy
>> to perform several arithmetic operations by knowing that they can't use
cos/sin/etc. on this type; in other words, I am not
>> sure it would be a good idea to embed it as an every-day type but I
think it would be nice to have it quickly available
>> in some way). If you agree with that, in which way should I code it (the
current link only is a "proof of concept"; I would
>> be very happy to code it in some cleaner way)?
>>
>> b) Do you think such attempt should remain something external to Numpy
itself and be released on my Github account without being
>> integrated to Numpy?
>
>
> I think astropy does something similar for time and dates. There has also
been some talk of adding a user type for ieee 128 bit doubles. I've looked
once for relevant code for the latter and, IIRC, the available packages
were GPL :(.

You're probably thinking of the __float128 support in gcc, which relies on
a LGPL (not GPL) runtime support library. (LGPL = any patches to the
support library itself need to remain open source, but no restrictions are
imposed on code that merely uses it.)

Still, probably something that should be done outside of numpy itself for
now.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread Charles R Harris
On Fri, Dec 11, 2015 at 10:45 AM, Nathaniel Smith  wrote:

> On Dec 11, 2015 7:46 AM, "Charles R Harris" 
> wrote:
> >
> >
> >
> > On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel 
> wrote:
> >>
> >> From time to time it is asked on forums how to extend precision of
> computation on Numpy array. The most common answer
> >> given to this question is: use the dtype=object with some arbitrary
> precision module like mpmath or gmpy.
> >> See
> http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
> or http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
> or
> http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
> >>
> >> While this is obviously the most relevant answer for many users because
> it will allow them to use Numpy arrays exactly
> >> as they would have used them with native types, the wrong thing is that
> from some point of view "true" vectorization
> >> will be lost.
> >>
> >> With years I got very familiar with the extended double-double type
> which has (for usual architectures) about 32 accurate
> >> digits with faster arithmetic than "arbitrary precision types". I even
> used it for research purpose in number theory and
> >> I got convinced that it is a very wonderful type as long as such
> precision is suitable.
> >>
> >> I often implemented it partially under Numpy, most of the time by
> trying to vectorize at a low-level the libqd library.
> >>
> >> But I recently thought that a very nice and portable way of
> implementing it under Numpy would be to use the existing layer
> >> of vectorization on floats for computing the arithmetic operations by
> "columns containing half of the numbers" rather than
> >> by "full numbers". As a proof of concept I wrote the following file:
> https://gist.github.com/baruchel/c86ed748939534d8910d
> >>
> >> I converted and vectorized the Algol 60 codes from
> http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
> >> (Dekker, 1971).
> >>
> >> A test is provided at the end; for inverting 100,000 numbers, my type
> is about 3 or 4 times faster than GMPY and almost
> >> 50 times faster than MPmath. It should be even faster for some other
> operations since I had to create another np.ones
> >> array for testing this type because inversion isn't implemented here
> (which could of course be done). You can run this file by yourself
> >> (maybe you will have to discard mpmath or gmpy if you don't have it).
> >>
> >> I would like to discuss about the way to make available something
> related to that.
> >>
> >> a) Would it be relevant to include that in Numpy ? (I would think to
> some "contribution"-tool rather than including it in
> >> the core of Numpy because it would be painful to code all ufuncs; on
> the other hand I am pretty sure that many would be happy
> >> to perform several arithmetic operations by knowing that they can't use
> cos/sin/etc. on this type; in other words, I am not
> >> sure it would be a good idea to embed it as an every-day type but I
> think it would be nice to have it quickly available
> >> in some way). If you agree with that, in which way should I code it
> (the current link only is a "proof of concept"; I would
> >> be very happy to code it in some cleaner way)?
> >>
> >> b) Do you think such attempt should remain something external to Numpy
> itself and be released on my Github account without being
> >> integrated to Numpy?
> >
> >
> > I think astropy does something similar for time and dates. There has
> also been some talk of adding a user type for ieee 128 bit doubles. I've
> looked once for relevant code for the latter and, IIRC, the available
> packages were GPL :(.
>
> You're probably thinking of the __float128 support in gcc, which relies on
> a LGPL (not GPL) runtime support library. (LGPL = any patches to the
> support library itself need to remain open source, but no restrictions are
> imposed on code that merely uses it.)
>
> Still, probably something that should be done outside of numpy itself for
> now.
>

No, there are several other software packages out there. I know of the gcc
version, but was looking for something more portable.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread Eric Moore
I have a mostly complete wrapping of the double-double type from the QD
library (http://crd-legacy.lbl.gov/~dhbailey/mpdist/) into a numpy dtype.
The real problem is, as david pointed out, user dtypes aren't quite full
equivalents of the builtin dtypes.  I can post the code if there is
interest.

Something along the lines of what's being discussed here would be nice,
since the extended type is subject to such variation.

Eric

On Fri, Dec 11, 2015 at 12:51 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Fri, Dec 11, 2015 at 10:45 AM, Nathaniel Smith  wrote:
>
>> On Dec 11, 2015 7:46 AM, "Charles R Harris" 
>> wrote:
>> >
>> >
>> >
>> > On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel 
>> wrote:
>> >>
>> >> From time to time it is asked on forums how to extend precision of
>> computation on Numpy array. The most common answer
>> >> given to this question is: use the dtype=object with some arbitrary
>> precision module like mpmath or gmpy.
>> >> See
>> http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
>> or
>> http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
>> or
>> http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
>> >>
>> >> While this is obviously the most relevant answer for many users
>> because it will allow them to use Numpy arrays exactly
>> >> as they would have used them with native types, the wrong thing is
>> that from some point of view "true" vectorization
>> >> will be lost.
>> >>
>> >> With years I got very familiar with the extended double-double type
>> which has (for usual architectures) about 32 accurate
>> >> digits with faster arithmetic than "arbitrary precision types". I even
>> used it for research purpose in number theory and
>> >> I got convinced that it is a very wonderful type as long as such
>> precision is suitable.
>> >>
>> >> I often implemented it partially under Numpy, most of the time by
>> trying to vectorize at a low-level the libqd library.
>> >>
>> >> But I recently thought that a very nice and portable way of
>> implementing it under Numpy would be to use the existing layer
>> >> of vectorization on floats for computing the arithmetic operations by
>> "columns containing half of the numbers" rather than
>> >> by "full numbers". As a proof of concept I wrote the following file:
>> https://gist.github.com/baruchel/c86ed748939534d8910d
>> >>
>> >> I converted and vectorized the Algol 60 codes from
>> http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
>> >> (Dekker, 1971).
>> >>
>> >> A test is provided at the end; for inverting 100,000 numbers, my type
>> is about 3 or 4 times faster than GMPY and almost
>> >> 50 times faster than MPmath. It should be even faster for some other
>> operations since I had to create another np.ones
>> >> array for testing this type because inversion isn't implemented here
>> (which could of course be done). You can run this file by yourself
>> >> (maybe you will have to discard mpmath or gmpy if you don't have it).
>> >>
>> >> I would like to discuss about the way to make available something
>> related to that.
>> >>
>> >> a) Would it be relevant to include that in Numpy ? (I would think to
>> some "contribution"-tool rather than including it in
>> >> the core of Numpy because it would be painful to code all ufuncs; on
>> the other hand I am pretty sure that many would be happy
>> >> to perform several arithmetic operations by knowing that they can't
>> use cos/sin/etc. on this type; in other words, I am not
>> >> sure it would be a good idea to embed it as an every-day type but I
>> think it would be nice to have it quickly available
>> >> in some way). If you agree with that, in which way should I code it
>> (the current link only is a "proof of concept"; I would
>> >> be very happy to code it in some cleaner way)?
>> >>
>> >> b) Do you think such attempt should remain something external to Numpy
>> itself and be released on my Github account without being
>> >> integrated to Numpy?
>> >
>> >
>> > I think astropy does something similar for time and dates. There has
>> also been some talk of adding a user type for ieee 128 bit doubles. I've
>> looked once for relevant code for the latter and, IIRC, the available
>> packages were GPL :(.
>>
>> You're probably thinking of the __float128 support in gcc, which relies
>> on a LGPL (not GPL) runtime support library. (LGPL = any patches to the
>> support library itself need to remain open source, but no restrictions are
>> imposed on code that merely uses it.)
>>
>> Still, probably something that should be done outside of numpy itself for
>> now.
>>
>
> No, there are several other software packages out there. I know of the gcc
> version, but was looking for something more portable.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
>