[Numpy-discussion] Requesting a PR review for #5822

2016-06-09 Thread Antony Lee
https://github.com/numpy/numpy/pull/5822 is a year-old PR which allows many
random distributions to have a scale of exactly 0 (in which case a stream
of zeros is returned of whatever constant value is appropriate).
It passes all tests and has been sitting there for a while.  Would a core
dev be kind enough to have a look at it?
Thanks!
Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Changing the behavior of (builtins.)round (via the __round__ dunder) to return an integer

2016-04-13 Thread Antony Lee
https://github.com/numpy/numpy/issues/3511 proposed (nearly three years
ago) to return an integer when `builtins.round` (which calls the `__round__
dunder method, and thereafter called `round` (... not to be confused with
`np.round`)) is called with a single argument.  Currently, `round` returns
a floating scalar for numpy scalars, matching the Python2 behavior.

Python3 changed the behavior of `round` to return an int when it is called
with a single argument (otherwise, the return type matches the type of the
first argument).  I believe this is more intuitive, and is arguably
becoming more important now that numpy is deprecating (via a
VisibleDeprecationWarning) indexing with a float: having to write

array[int(round(some_float))]


is rather awkward.  (Note that I am suggesting to switch to the new
behavior regardless of the version of Python.)

Note that currently the `__round__` dunder is not implemented for arrays
(... see https://github.com/numpy/numpy/issues/6248) so it would be
feasible to always return a signed integer of the same size with an
OverflowError on overflow (at least, any floating point that is round-able
without loss of precision will be covered).  If `__round__` ends up being
implemented for ndarrays too, I guess the correct behavior will be whatever
we come up for signaling failure in integer operations (see current
behavior of `np.array([0, 1]) // np.array([0, 1])`).

Also note the comment posted by @njsmith on the github issue thread:

I'd be fine with matching python here, but we need to run it by the mailing
list.

Not clear what the right kind of deprecation is... Normally FutureWarning
since there's no error involved, but that would both be very annoying
(basically makes round unusable -- you get this noisy warning even if what
you're doing is round(a).astype(int)), and the change is relatively low
risk compared to most FutureWarning changes, since the actual values
returned are identical before and after the change.


Thoughts?

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Floor divison on int returns float

2016-04-12 Thread Antony Lee
Whatever the C rules are (which I don't know off the top of my head, but I
guess it must be one of uint64 or int64).  It's not as if conversion to
float64 was lossless:

In [38]: 2**63 - (np.int64(2**62-1) + np.uint64(2**62-1))
Out[38]: 0.0


Note that the result of (np.int64(2**62-1) + np.uint64(2**62-1)) would
actually fit in an int64 (or an uint64), so arguably the conversion to
float makes things worse.

Antony

2016-04-12 19:56 GMT-07:00 Nathaniel Smith <n...@pobox.com>:

> So what type should uint64 + int64 return?
> On Apr 12, 2016 7:17 PM, "Antony Lee" <antony@berkeley.edu> wrote:
>
>> This kind of issue (see also https://github.com/numpy/numpy/issues/3511)
>> has become more annoying now that indexing requires integers (indexing with
>> a float raises a VisibleDeprecationWarning).  The argument "dividing an
>> uint by an int may give a result that does not fit in an uint nor in an
>> int" does not sound very convincing to me, after all even adding two
>> (sized) ints may give a result that does not fit in the same size, but
>> numpy does not upcast everything there:
>>
>> In [17]: np.int32(2**31 - 1) + np.int32(2**31 - 1)
>> Out[17]: -2
>>
>> In [18]: type(np.int32(2**31 - 1) + np.int32(2**31 - 1))
>> Out[18]: numpy.int32
>>
>>
>> I'd think that overflowing operations should just overflow (and possibly
>> raise a warning via the seterr mechanism), but their possibility should not
>> be an argument for modifying the output type.
>>
>> Antony
>>
>> 2016-04-12 17:57 GMT-07:00 T J <tjhn...@gmail.com>:
>>
>>> Thanks Eric.
>>>
>>> Also relevant: https://github.com/numba/numba/issues/909
>>>
>>> Looks like Numba has found a way to avoid this edge case.
>>>
>>>
>>>
>>> On Monday, April 4, 2016, Eric Firing <efir...@hawaii.edu> wrote:
>>>
>>>> On 2016/04/04 9:23 AM, T J wrote:
>>>>
>>>>> I'm on NumPy 1.10.4 (mkl).
>>>>>
>>>>>  >>> np.uint(3) // 2   # 1.0
>>>>>  >>> 3 // 2   # 1
>>>>>
>>>>> Is this behavior expected? It's certainly not desired from my
>>>>> perspective. If this is not a bug, could someone explain the rationale
>>>>> to me.
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>> I agree that it's almost always undesirable; one would reasonably
>>>> expect some sort of int.  Here's what I think is going on:
>>>>
>>>> The odd behavior occurs only with np.uint, which is np.uint64, and when
>>>> the denominator is a signed int.  The problem is that if the denominator is
>>>> negative, the result will be negative, so it can't have the same type as
>>>> the first numerator.  Furthermore, if the denominator is -1, the result
>>>> will be minus the numerator, and that can't be represented by np.uint or
>>>> np.int.  Therefore the result is returned as np.float64.  The
>>>> promotion rules are based on what *could* happen in an operation, not on
>>>> what *is* happening in a given instance.
>>>>
>>>> Eric
>>>>
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@scipy.org
>>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Floor divison on int returns float

2016-04-12 Thread Antony Lee
This kind of issue (see also https://github.com/numpy/numpy/issues/3511)
has become more annoying now that indexing requires integers (indexing with
a float raises a VisibleDeprecationWarning).  The argument "dividing an
uint by an int may give a result that does not fit in an uint nor in an
int" does not sound very convincing to me, after all even adding two
(sized) ints may give a result that does not fit in the same size, but
numpy does not upcast everything there:

In [17]: np.int32(2**31 - 1) + np.int32(2**31 - 1)
Out[17]: -2

In [18]: type(np.int32(2**31 - 1) + np.int32(2**31 - 1))
Out[18]: numpy.int32


I'd think that overflowing operations should just overflow (and possibly
raise a warning via the seterr mechanism), but their possibility should not
be an argument for modifying the output type.

Antony

2016-04-12 17:57 GMT-07:00 T J :

> Thanks Eric.
>
> Also relevant: https://github.com/numba/numba/issues/909
>
> Looks like Numba has found a way to avoid this edge case.
>
>
>
> On Monday, April 4, 2016, Eric Firing  wrote:
>
>> On 2016/04/04 9:23 AM, T J wrote:
>>
>>> I'm on NumPy 1.10.4 (mkl).
>>>
>>>  >>> np.uint(3) // 2   # 1.0
>>>  >>> 3 // 2   # 1
>>>
>>> Is this behavior expected? It's certainly not desired from my
>>> perspective. If this is not a bug, could someone explain the rationale
>>> to me.
>>>
>>> Thanks.
>>>
>>
>> I agree that it's almost always undesirable; one would reasonably expect
>> some sort of int.  Here's what I think is going on:
>>
>> The odd behavior occurs only with np.uint, which is np.uint64, and when
>> the denominator is a signed int.  The problem is that if the denominator is
>> negative, the result will be negative, so it can't have the same type as
>> the first numerator.  Furthermore, if the denominator is -1, the result
>> will be minus the numerator, and that can't be represented by np.uint or
>> np.int.  Therefore the result is returned as np.float64.  The promotion
>> rules are based on what *could* happen in an operation, not on what *is*
>> happening in a given instance.
>>
>> Eric
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-18 Thread Antony Lee
In a sense this discussion is really about making np.array(iterable) more
efficient, so I restarted the discussion at
https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075059.html

Antony

2016-02-18 14:21 GMT-08:00 Chris Barker <chris.bar...@noaa.gov>:

> On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee <antony@berkeley.edu>
> wrote:
>
>> Mostly so that there is no performance lost when someone passes
>> range(...) instead of np.arange(...).  At least I had never realized that
>> one is much faster than the other and always just passed range() as a
>> convenience.
>>
>
> Well,  pretty much everything in numpy is faster if you use the numpy
> array version rather than plain python -- this hardly seems like the extra
> code would be worth it.
>
> numpy's array() constructor can (and should) take an arbitrary iterable.
>
> It does make some sense that you we might want to special case iterators,
> as you don't want to loop through them too many times, which is what
> np.fromiter() is for.
>
> and _maybe_ it would be worth special casing python lists, as you can
> access items faster, and they are really, really common (or has this
> already been done?), but special casing range() is getting silly. And it
> might be hard to do. At the C level I suppose you could actually know what
> the parameters and state of the range object are and create an array
> directly from that -- but that's what arange is for...
>
> -CHB
>
>
>
>> 2016-02-17 10:50 GMT-08:00 Chris Barker <chris.bar...@noaa.gov>:
>>
>>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony@berkeley.edu>
>>> wrote:
>>>
>>>> So how can np.array(range(...)) even work?
>>>>
>>>
>>> range()  (in py3) is not a generator, nor is is a iterator. it is a
>>> range object, which is lazily evaluated, and satisfies both the iterator
>>> protocol and the sequence protocol (at least most of it:
>>>
>>> In [*1*]: r = range(10)
>>>
>>>
>>> In [*2*]: r[3]
>>>
>>> Out[*2*]: 3
>>>
>>>
>>> In [*3*]: len(r)
>>>
>>> Out[*3*]: 10
>>>
>>>
>>> In [*4*]: type(r)
>>>
>>> Out[*4*]: range
>>>
>>> In [*9*]: isinstance(r, collections.abc.Sequence)
>>>
>>> Out[*9*]: True
>>>
>>> In [*10*]: l = list()
>>>
>>> In [*11*]: isinstance(l, collections.abc.Sequence)
>>>
>>> Out[*11*]: True
>>>
>>> In [*12*]: isinstance(r, collections.abc.Iterable)
>>>
>>> Out[*12*]: True
>>> I'm still totally confused as to why we'd need to special-case range
>>> when we have arange().
>>>
>>> -CHB
>>>
>>>
>>>
>>> --
>>>
>>> Christopher Barker, Ph.D.
>>> Oceanographer
>>>
>>> Emergency Response Division
>>> NOAA/NOS/OR(206) 526-6959   voice
>>> 7600 Sand Point Way NE   (206) 526-6329   fax
>>> Seattle, WA  98115   (206) 526-6317   main reception
>>>
>>> chris.bar...@noaa.gov
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-18 Thread Antony Lee
Mostly so that there is no performance lost when someone passes range(...)
instead of np.arange(...).  At least I had never realized that one is much
faster than the other and always just passed range() as a convenience.

Antony

2016-02-17 10:50 GMT-08:00 Chris Barker <chris.bar...@noaa.gov>:

> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony@berkeley.edu>
> wrote:
>
>> So how can np.array(range(...)) even work?
>>
>
> range()  (in py3) is not a generator, nor is is a iterator. it is a range
> object, which is lazily evaluated, and satisfies both the iterator protocol
> and the sequence protocol (at least most of it:
>
> In [*1*]: r = range(10)
>
>
> In [*2*]: r[3]
>
> Out[*2*]: 3
>
>
> In [*3*]: len(r)
>
> Out[*3*]: 10
>
>
> In [*4*]: type(r)
>
> Out[*4*]: range
>
> In [*9*]: isinstance(r, collections.abc.Sequence)
>
> Out[*9*]: True
>
> In [*10*]: l = list()
>
> In [*11*]: isinstance(l, collections.abc.Sequence)
>
> Out[*11*]: True
>
> In [*12*]: isinstance(r, collections.abc.Iterable)
>
> Out[*12*]: True
> I'm still totally confused as to why we'd need to special-case range when
> we have arange().
>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2016-02-18 Thread Antony Lee
Actually, while working on https://github.com/numpy/numpy/issues/7264 I
realized that the memory efficiency (one-pass) argument is simply incorrect:

import numpy as np

class A:
def __getitem__(self, i):
print("A get item", i)
return [np.int8(1), np.int8(2)][i]
def __len__(self):
return 2

print(repr(np.array(A(

This prints out

A get item 0
A get item 1
A get item 2
A get item 0
A get item 1
A get item 2
A get item 0
A get item 1
A get item 2
array([1, 2], dtype=int8)

i.e. the sequence is "turned into a concrete sequence" no less than 3 times.

Antony

2016-01-19 11:33 GMT-08:00 Stephan Sahm :

> just to not prevent it from the black hole - what about integrating
> fromiter into array? (see the post by Benjamin Root)
>
> for me personally, taking the first element for deducing the dtype would
> be a perfect default way to read generators. If one wants a specific other
> dtype, one could specify it like in the current fromiter method.
>
> On 15 December 2015 at 08:08, Stephan Sahm  wrote:
>
>> I would like to further push Benjamin Root's suggestion:
>>
>> "Therefore, I think it is not out of the realm of reason that passing a
>> generator object and a dtype could then delegate the work under the hood to
>> np.fromiter()? I would even go so far as to raise an error if one passes a
>> generator without specifying dtype to np.array(). The point is to reduce
>> the number of entry points for creating numpy arrays."
>>
>> would this be ok?
>>
>> On Mon, Dec 14, 2015 at 6:50 PM Robert Kern 
>> wrote:
>>
>>> On Mon, Dec 14, 2015 at 5:41 PM, Benjamin Root 
>>> wrote:
>>> >
>>> > Heh, never noticed that. Was it implemented more like a
>>> generator/iterator in older versions of Python?
>>>
>>> No, it predates generators and iterators so it has always had to be
>>> implemented like that.
>>>
>>> --
>>> Robert Kern
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread Antony Lee
See earlier discussion here: https://github.com/numpy/numpy/issues/6326
Basically, naïvely sorting may be faster than a not-so-optimized version of
quickselect.

Antony

2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz :

> I would like to add a `weights` keyword to `np.partition`,
> `np.percentile` and `np.median`. My reason for doing so is to to allow
> `np.histogram` to process automatic bin selection with weights.
> Currently, weights are not supported for the automatic bin selection
> and would be difficult to support in `auto` mode without having
> `np.percentile` support a `weights` keyword. I suspect that there are
> many other uses for such a feature.
>
> I have taken a preliminary look at the C implementation of the
> partition functions that are the basis for `partition`, `median` and
> `percentile`. I think that it would be possible to add versions (or
> just extend the functionality of existing ones) that check the ratio
> of the weights below the partition point to the total sum of the
> weights instead of just counting elements.
>
> One of the main advantages of such an implementation is that it would
> allow any real weights to be handled correctly, not just integers.
> Complex weights would not be supported.
>
> The purpose of this email is to see if anybody objects, has ideas or
> cares at all about this proposal before I spend a significant amount
> of time working on it. For example, did I miss any functions in my
> list?
>
> Regards,
>
> -Joe
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-15 Thread Antony Lee
Indeed:

In [1]: class C:
def __getitem__(self, i):
if i < 10: return i
else: raise IndexError
def __len__(self):
return 10
   ...:

In [2]: np.array(C())
Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


(omitting __len__ results in the creation of an object array, consistently
with the fact that the sequence protocol requires __len__).
Meanwhile, I found a new way to segfault numpy :-)

In [3]: class C:
def __getitem__(self, i):
if i < 10: return i
else: raise IndexError
def __len__(self):
return 42
   ...:

In [4]: np.array(C())
Fatal Python error: Segmentation fault


2016-02-15 0:10 GMT-08:00 Nathaniel Smith <n...@pobox.com>:

> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony@berkeley.edu>
> wrote:
> > I wonder whether numpy is using the "old" iteration protocol (repeatedly
> > calling x[i] for increasing i until StopIteration is reached?)  A quick
> > timing shows that it is indeed slower.
>
> Yeah, I'm pretty sure that np.array doesn't know anything about
> "iterable", just about "sequence" (calling x[i] for 0 <= i <
> i.__len__()).
>
> (See Sequence vs Iterable:
> https://docs.python.org/3/library/collections.abc.html)
>
> Personally I'd like it if we could eventually make it so np.array
> specifically looks for lists and only lists, because the way it has so
> many different fallbacks right now creates all confusion between which
> objects are elements. Compare:
>
> In [5]: np.array([(1, 2), (3, 4)]).shape
> Out[5]: (2, 2)
>
> In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape
> Out[6]: (2,)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-14 Thread Antony Lee
I wonder whether numpy is using the "old" iteration protocol (repeatedly
calling x[i] for increasing i until StopIteration is reached?)  A quick
timing shows that it is indeed slower.
... actually it's not even clear to me what qualifies as a sequence for
`np.array`:

class C:
def __iter__(self):
return iter(range(10)) # [0... 9] under the new iteration protocol
def __getitem__(self, i):
raise IndexError # [] under the old iteration protocol

np.array(C())
===> array(<__main__.C object at 0x7f3f2128>, dtype=object)


So how can np.array(range(...)) even work?

2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gomm...@gmail.com>:

>
>
> On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <ralf.gomm...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony@berkeley.edu>
>>> wrote:
>>>
>>>> re: no reason why...
>>>> This has nothing to do with Python2/Python3 (I personally stopped using
>>>> Python2 at least 3 years ago.)  Let me put it this way instead: if
>>>> Python3's "range" (or Python2's "xrange") was not a builtin type but a type
>>>> provided by numpy, I don't think it would be controversial at all to
>>>> provide an `__array__` special method to efficiently convert it to a
>>>> ndarray.  It would be the same if `np.array` used a
>>>> `functools.singledispatch` dispatcher rather than an `__array__` special
>>>> method (which is obviously not possible for chronological reasons).
>>>>
>>>> re: iterable vs iterator: check for the presence of the __next__
>>>> special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and
>>>> not isinstance(x, Iterable))
>>>>
>>>
>>> I think it's good to do something about this, but it's not clear what
>>> the exact proposal is. I could image one or both of:
>>>
>>>   - special-case the range() object in array (and asarray/asanyarray?)
>>> such that array(range(N)) becomes as fast as arange(N).
>>>   - special-case all iterators, such that array(range(N)) becomes as
>>> fast as deque(range(N))
>>>
>>
>> I think the last wouldn't help much, as numpy would still need to
>> determine dimensions and type.  I assume that is one of the reason sparse
>> itself doesn't do that.
>>
>
> Not orders of magnitude, but this shows that there's something to optimize
> for iterators:
>
> In [1]: %timeit np.array(range(10))
> 100 loops, best of 3: 14.9 ms per loop
>
> In [2]: %timeit np.array(list(range(10)))
> 100 loops, best of 3: 9.68 ms per loop
>
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-14 Thread Antony Lee
I was thinking (1) (special-case range()); however (2) may be more
generally applicable and useful.

Antony

2016-02-14 6:36 GMT-08:00 Ralf Gommers <ralf.gomm...@gmail.com>:

>
>
> On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony@berkeley.edu>
> wrote:
>
>> re: no reason why...
>> This has nothing to do with Python2/Python3 (I personally stopped using
>> Python2 at least 3 years ago.)  Let me put it this way instead: if
>> Python3's "range" (or Python2's "xrange") was not a builtin type but a type
>> provided by numpy, I don't think it would be controversial at all to
>> provide an `__array__` special method to efficiently convert it to a
>> ndarray.  It would be the same if `np.array` used a
>> `functools.singledispatch` dispatcher rather than an `__array__` special
>> method (which is obviously not possible for chronological reasons).
>>
>> re: iterable vs iterator: check for the presence of the __next__ special
>> method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
>> isinstance(x, Iterable))
>>
>
> I think it's good to do something about this, but it's not clear what the
> exact proposal is. I could image one or both of:
>
>   - special-case the range() object in array (and asarray/asanyarray?)
> such that array(range(N)) becomes as fast as arange(N).
>   - special-case all iterators, such that array(range(N)) becomes as fast
> as deque(range(N))
>
> or yet something else?
>
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-14 Thread Antony Lee
re: no reason why...
This has nothing to do with Python2/Python3 (I personally stopped using
Python2 at least 3 years ago.)  Let me put it this way instead: if
Python3's "range" (or Python2's "xrange") was not a builtin type but a type
provided by numpy, I don't think it would be controversial at all to
provide an `__array__` special method to efficiently convert it to a
ndarray.  It would be the same if `np.array` used a
`functools.singledispatch` dispatcher rather than an `__array__` special
method (which is obviously not possible for chronological reasons).

re: iterable vs iterator: check for the presence of the __next__ special
method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
isinstance(x, Iterable))

Antony

2016-02-13 18:48 GMT-08:00 <josef.p...@gmail.com>:

>
>
> On Sat, Feb 13, 2016 at 9:43 PM, <josef.p...@gmail.com> wrote:
>
>>
>>
>> On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee <antony@berkeley.edu>
>> wrote:
>>
>>> Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
>>>
>>> In [2]: %timeit np.array(range(100), np.int64)
>>> 10 loops, best of 3: 156 ms per loop
>>>
>>> In [3]: %timeit np.arange(100, dtype=np.int64)
>>> 1000 loops, best of 3: 853 µs per loop
>>>
>>>
>>> Note that while iterating over a range is not very fast, it is still
>>> much better than the array creation:
>>>
>>> In [4]: from collections import deque
>>>
>>> In [5]: %timeit deque(range(100), 1)
>>> 10 loops, best of 3: 25.5 ms per loop
>>>
>>>
>>> On one hand, special cases are awful. On the other hand, the range
>>> builtin is probably important enough to deserve a special case to make this
>>> construction faster. Or not? I initially opened this as
>>> https://github.com/numpy/numpy/issues/7233 but it was suggested there
>>> that this should be discussed on the ML first.
>>>
>>> (The real issue which prompted this suggestion: I was building sparse
>>> matrices using scipy.sparse.csc_matrix with some indices specified using
>>> range, and that construction step turned out to take a significant portion
>>> of the time because of the calls to np.array).
>>>
>>
>>
>> IMO: I don't see a reason why this should be supported. There is
>> np.arange after all for this usecase, and from_iter.
>> range and the other guys are iterators, and in several cases we can use
>> larange = list(range(...)) as a short cut to get python list.for python 2/3
>> compatibility.
>>
>> I think this might be partially a learning effect in the python 2 to 3
>> transition. After using almost only python 3 for maybe a year, I don't
>> think it's difficult to remember the differences when writing code that is
>> py 2.7 and py 3.x compatible.
>>
>>
>> It's just **another** thing to watch out for if milliseconds matter in
>> your application.
>>
>
>
> side question: Is there a simple way to distinguish a iterator or
> generator from an iterable data structure?
>
> Josef
>
>
>
>>
>> Josef
>>
>>
>>>
>>> Antony
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-13 Thread Antony Lee
Compare (on Python3 -- for Python2, read "xrange" instead of "range"):

In [2]: %timeit np.array(range(100), np.int64)
10 loops, best of 3: 156 ms per loop

In [3]: %timeit np.arange(100, dtype=np.int64)
1000 loops, best of 3: 853 µs per loop


Note that while iterating over a range is not very fast, it is still much
better than the array creation:

In [4]: from collections import deque

In [5]: %timeit deque(range(100), 1)
10 loops, best of 3: 25.5 ms per loop


On one hand, special cases are awful. On the other hand, the range builtin
is probably important enough to deserve a special case to make this
construction faster. Or not? I initially opened this as
https://github.com/numpy/numpy/issues/7233 but it was suggested there that
this should be discussed on the ML first.

(The real issue which prompted this suggestion: I was building sparse
matrices using scipy.sparse.csc_matrix with some indices specified using
range, and that construction step turned out to take a significant portion
of the time because of the calls to np.array).

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fixing the dtype of np.full's return value

2015-09-27 Thread Antony Lee
Hi all,

The docstring of np.full indicates that the result of the dtype is
`np.array(fill_value).dtype`, as long as the keyword argument `dtype`
itself is not set.  This is actually not the case: the current
implementation always returns a float array when `dtype` is not set, see
e.g.

In [1]: np.full(1, 1)
Out[1]: array([ 1.])

In [2]: np.full(1, None)
Out[2]: array([ nan])

In [3]: np.full(1, None).dtype
Out[3]: dtype('float64')

In [4]: np.array(None)
Out[4]: array(None, dtype=object)

The note about return value of the dtype was actually explicitly discussed
in https://github.com/numpy/numpy/pull/2875 but the tests failed to cover
the case where the `dtype` argument is not passed.

We could either change the docstring to match the current behavior, or fix
the behavior to match what the docstring says (my preference).  @njsmith
mentioned in https://github.com/numpy/numpy/issues/6366 that this may be
acceptable as a bug fix, as "it's a very new function so there probably
aren't many people relying on it" (it was introduced in 1.8).

I guess the options are:
- Fix the behavior outright and squeeze this in 1.10 as a bugfix (my
preference).
- Emit a warning in 1.10, fix in 1.11.
- Do nothing for 1.10, warn in 1.11, fix in 1.12 (at that point the
argument of `np.full` being a very new function starts becoming invalid...).
- Change the docstring.

Thoughts?

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-06-09 Thread Antony Lee
2015-05-29 14:06 GMT-07:00 Antony Lee antony@berkeley.edu:


 A proof-of-concept implementation, still missing tests, is tracked as
 #5911.  It includes the patch proposed in #5158 as an example of how to
 include an improved version of random.choice.


 Tests are in now (whether we should bundle in pickles of old versions to
 make sure they are still unpickled correctly and outputs of old random
 streams to make sure they are still reproduced is a good question, though).
 Comments welcome.


Kindly bumping the issue.

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-29 Thread Antony Lee

 A proof-of-concept implementation, still missing tests, is tracked as
 #5911.  It includes the patch proposed in #5158 as an example of how to
 include an improved version of random.choice.


Tests are in now (whether we should bundle in pickles of old versions to
make sure they are still unpickled correctly and outputs of old random
streams to make sure they are still reproduced is a good question, though).
Comments welcome.

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Antony Lee
Thanks to Nathaniel who has indeed clarified my intent, i.e. the global
RandomState should use the latest implementation, unless explicitly
seeded.  More generally, the `RandomState` constructor is just a thin
wrapper around `seed` with the same signature, so one can swap the version
of the global functions with a call to `np.random.seed(version=...)`.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Antony Lee
2015-05-24 13:30 GMT-07:00 Sturla Molden sturla.mol...@gmail.com:

 On 24/05/15 10:22, Antony Lee wrote:

  Comments, and help for writing tests (in particular to make sure
  backwards compatibility is maintained) are welcome.

 I have one comment, and that is what makes random numbers so special?
 This applies to the rest of NumPy too, fixing a bug can sometimes change
 the output of a function.

 Personally I think we should only make guarantees about the data types,
 array shapes, and things like that, but not about the values. Those who
 need a particular version of NumPy for exact reproducibility should
 install the version of Python and NumPy they need. That is why virtual
 environments exist.


I personally agree with this point of view (see original discussion in
#5299, for example); if it was only up to me at least I'd make
RandomState(seed) default to the latest version rather than the original
one (whether to keep the old versions around is another question).  On the
other hand, I see that this long-standing debate has prevented obvious
improvements from being added sometimes for years (e.g. a patch for
Ziggurat normal variates has been lying around since 2010), or led to
potential API duplication in order to fix some clearly undesirable behavior
(dirichlet returning nan being described as in a strict sense not really
a bug(!)), so I'm willing to compromise to get this moving forward.

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Antony Lee
Hi,

As mentioned in

#1450: Patch with Ziggurat method for Normal distribution
#5158: ENH: More efficient algorithm for unweighted random choice without
replacement
#5299: using `random.choice` to sample integers in a large range
#5851: Bug in np.random.dirichlet for small alpha parameters

some methods on np.random.RandomState are implemented either non-optimally
(#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily
changed due to backwards compatibility concerns.  While some have suggested
new methods deprecating the old ones (see e.g. #5872), some consensus has
formed around the following ideas (see #5299 for original discussion,
followed by private discussions with @njsmith):

- Backwards compatibility should only be provided to those who were
explicitly instantiating a seeded RandomState object or reseeding a
RandomState object to a given value, and drawing variates from it: using
the global methods (or a None-seeded RandomState) was already
non-reproducible anyways as e.g. other libraries could be drawing variates
from the global RandomState (of which the free functions in np.random are
actually methods).  Thus, the global RandomState object should use the
latest implementation of the methods.

- RandomState(seed) and r = RandomState(...); r.seed(seed) should offer
backwards-compatibility guarantees (see e.g.
https://docs.python.org/3.4/library/random.html#notes-on-reproducibility).

As such, we propose the following improvements to the API:

- RandomState gains a (keyword-only) parameter, version, also accessible
as a read-only attribute.  This indicates the version of the methods on the
object.  The current version of RandomState is retroactively assigned
version 0.  The latest available version is available as
np.random.LATEST_VERSION.  Backwards-incompatible improvements to
RandomState methods can be introduced but increase the LAGTEST_VERSION.

- The global RandomState is instantiated as
RandomState(version=LATEST_VERSION).

- RandomState() and rs.seed() sets the version to LATEST_VERSION.

- RandomState(seed[!=None]) and rs.seed(seed[!=None]) sets the version to 0.

A proof-of-concept implementation, still missing tests, is tracked as
#5911.  It includes the patch proposed in #5158 as an example of how to
include an improved version of random.choice.

Comments, and help for writing tests (in particular to make sure backwards
compatibility is maintained) are welcome.

Antony Lee
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-14 Thread Antony Lee
Another improvement would be to make sure, for integer-valued datasets,
that all bins cover the same number of integer, as it is easy to end up
otherwise with bins effectively wider than others:

hist(np.random.randint(11, size=1))

shows a peak in the last bin, as it covers both 9 and 10.

Antony

2015-04-13 5:02 GMT-07:00 Neil Girdhar mistersh...@gmail.com:

 Can I suggest that we instead add the P-square algorithm for the dynamic
 calculation of histograms?  (
 http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
 )

 This is already implemented in C++'s boost library (
 http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
 )

 I implemented it in Boost Python as a module, which I'm happy to share.
 This is much better than fixed-width histograms in practice.  Rather than
 adjusting the number of bins, it adjusts what you really want, which is the
 resolution of the bins throughout the domain.

 Best,

 Neil

 On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:



 On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote:


 http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
 tistics/A
 http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A
 utomating%20Binwidth%20Choice%20for%20Histogram.ipynb

 Long story short, histogram visualisations that depend on numpy (such as
 matplotlib, or  nearly all of them) have poor default behaviour as I
 have to
 constantly play around with  the number of bins to get a good idea of
 what I'm
 looking at. The bins=10 works ok for  up to 1000 points or very normal
 data,
 but has poor performance for anything else, and  doesn't account for
 variability either. I don't have a method easily available to scale the
 number
 of bins given the data.

 R doesn't suffer from these problems and provides methods for use with
 it's
 hist  method. I would like to provide similar functionality for
 matplotlib, to
 at least provide  some kind of good starting point, as histograms are
 very
 useful for initial data discovery.

 The notebook above provides an explanation of the problem as well as
 some
 proposed  alternatives. Use different datasets (type and size) to see
 the
 performance of the  suggestions. All of the methods proposed exist in R
 and
 literature.

 I've put together an implementation to add this new functionality, but
 am
 hesitant to  make a pull request as I would like some feedback from a
 maintainer before doing so.


 +1 on the PR.


 +1 as well.

 Unfortunately we can't change the default of 10, but a number of string
 methods, with a bins=auto or some such name prominently recommended in
 the docstring, would be very good to have.

 Ralf

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] edge-cases of ellipsis indexing

2015-01-05 Thread Antony Lee
I see, thanks!

2015-01-05 2:14 GMT-07:00 Sebastian Berg sebast...@sipsolutions.net:

 On Mo, 2015-01-05 at 14:13 +0530, Maniteja Nandana wrote:
  Hi Anthony,
 
 
  I am not sure whether the following section in documentation is
  relevant to the behavior you were referring to.
 
 
  When an ellipsis (...) is present but has no size (i.e. replaces
  zero :) the result will still always be an array. A view if no
  advanced index is present, otherwise a copy.
 

 Exactly. There are actually three forms of indexing to distinguish.

 1. Indexing with integers (also scalar arrays) matching the number of
 dimensions. This will return a *scalar*.
 2. Slicing, etc. which returns a view. This also occurs as soon there is
 an ellipsis in there (even if it replaces 0 `:`). You should see it as a
 feature to get a view if the result might be a scalar otherwise ;)!
 3. Advanced indexing which cannot be view based and returns a copy.

 - Sebastian


  Here, ...replaces zero :
 
 
 
  Advanced indexing always returns a copy of the data (contrast with
  basic slicing that returns a view).
  And I think it is a view that is returned in this case.
 
 
   a = array([1])
  a
  array([1])
  a[:,0]  # zero  : are present
  Traceback (most recent call last):
File stdin, line 1, in module
  IndexError: too many indices for array
  a[...,0]=2
  a
  array([2])
  a[0] = 3
  a
  array([3])
  a[(0,)] = 4
  a
  array([4])
  a[:
  array([1])
 
 
  Hope I helped.
 
 
  Cheers,
  N.Maniteja.
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] edge-cases of ellipsis indexing

2015-01-04 Thread Antony Lee
While trying to reproduce various fancy indexings for astropy's FITS
sections (a loaded-on-demand array), I found the following interesting
behavior:

 np.array([1])[..., 0]
array(1)
 np.array([1])[0]
1
 np.array([1])[(0,)]
1

The docs say Ellipsis expand to the number of : objects needed to make a
selection tuple of the same length as x.ndim., so it's not totally clear
to me how to explain that difference in the results.

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] truthiness of object arrays

2014-11-13 Thread Antony Lee
On Python3, __nonzero__ is never defined (always raises an AttributeError),
even after calling __bool__.

2014-11-13 5:24 GMT-08:00 Alan G Isaac alan.is...@gmail.com:

 On 11/13/2014 1:19 AM, Antony Lee wrote:
  t.__bool__() also returns True


 But t.__nonzero__() is being called in the `if` test.
 The question is: is the difference between `__nonzero__`
 and `__bool__` intentional.

 By the way, there has been a change in behavior.
 For example, in 1.7.1 if you call `t.__bool__()`
 it raised an attribute error -- unless one first
 called `t.__nonzero__()` and then called `t.__bool__()`,
 which was of course very weird and needed to be fixed.
 Maybe (?) not like this.

 In fact the oddity probably remains but moved. in 1.9.0 I see this:

   np.__version__
 '1.9.0'
   t = np.array(None); t[()] = np.array([None, None])
   t.__nonzero__()
 Traceback (most recent call last):
File stdin, line 1, in module
 AttributeError: 'numpy.ndarray' object has no attribute '__nonzero__'
   t.__bool__()
 True
   t.__nonzero__()
 ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all()

 Alan Isaac


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] truthiness of object arrays

2014-11-13 Thread Antony Lee
Dunno, seems unlikely that something changed with Python 3.4.2...
$ python --version
Python 3.4.2
$ python -c 'import numpy as np; print(np.__version__); t = np.array(None);
t[()] = np.array([None, None]); t.__bool__(); t.__nonzero__()'
1.9.0
Traceback (most recent call last):
  File string, line 1, in module
AttributeError: 'numpy.ndarray' object has no attribute '__nonzero__'


2014-11-13 10:05 GMT-08:00 Alan G Isaac alan.is...@gmail.com:

 On 11/13/2014 12:37 PM, Antony Lee wrote:
  On Python3, __nonzero__ is never defined (always raises an
 AttributeError), even after calling __bool__.


 The example I posted was Python 3.4.1 with numpy 1.9.0.

 fwiw,
 Alan Isaac

 Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
 bit (Intel)] on win32
 Type help, copyright, credits or license for more information.
   import numpy as np
   np.__version__
 '1.9.0'
   t = np.array(None); t[()] = np.array([None, None])
   t.__nonzero__()
 Traceback (most recent call last):
File stdin, line 1, in module
 AttributeError: 'numpy.ndarray' object has no attribute '__nonzero__'
   t.__bool__()
 True
   t.__nonzero__()
 ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all()
  


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] truthiness of object arrays

2014-11-12 Thread Antony Lee
I am puzzled by the following (numpy 1.9.0, python 3.4.2):

In [1]: t = array(None); t[()] = array([None, None])  # Construct a 0d
array of dtype object, containing a single numpy array with 2 elements

In [2]: bool(t)
Out[2]: True

In [3]: if t: pass
---
ValueErrorTraceback (most recent call last)
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

I thought that if x simply calls bool, but apparently this is not even
the case...

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Broadcasting with np.logical_and.reduce

2014-09-12 Thread Antony Lee
I am not using asarray here.  Sorry, but I don't see how this is relevant
-- my comparison with np.add.reduce is simply that when a list of float
arrays is passed to np.add.reduce, broadcasting happens as usual, but not
when a list of bool arrays is passed to np.logical_and.reduce.

2014-09-12 0:48 GMT-07:00 Sebastian Berg sebast...@sipsolutions.net:

 On Do, 2014-09-11 at 22:54 -0700, Antony Lee wrote:
  Hi,
  I thought that ufunc.reduce performs broadcasting, but it seems a bit
  confused by boolean arrays:
 
  ipython with pylab mode on
  In [1]: add.reduce([array([1, 2]), array([1])])
  Out[1]: array([2, 3])
  In [2]: logical_and.reduce([array([True, False], dtype=bool),
  array([True], dtype=bool)])
 
 ---
  ValueErrorTraceback (most recent call
  last)
  ipython-input-2-bedbab4c13e1 in module()
   1 logical_and.reduce([array([True, False], dtype=bool),
  array([True], dtype=bool)])
 
  ValueError: The truth value of an array with more than one element is
  ambiguous. Use a.any() or a.all()
 
  Am I missing something here?
 

 `np.asarray([array([1, 2]), array([1])])` is an object array, not a
 boolean array. You probably want to concatenate them.

 - Sebastian


  Thanks,
  Antony
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Broadcasting with np.logical_and.reduce

2014-09-12 Thread Antony Lee
I see.  I went back to the documentation of ufunc.reduce and this is not
explicitly mentioned although a posteriori it makes sense; perhaps this can
be made clearer there?
Antony

2014-09-12 2:22 GMT-07:00 Robert Kern robert.k...@gmail.com:

 On Fri, Sep 12, 2014 at 10:04 AM, Antony Lee antony@berkeley.edu
 wrote:
  I am not using asarray here.  Sorry, but I don't see how this is
 relevant --
  my comparison with np.add.reduce is simply that when a list of float
 arrays
  is passed to np.add.reduce, broadcasting happens as usual, but not when a
  list of bool arrays is passed to np.logical_and.reduce.

 But np.logical_and.reduce() *does* use asarray() when it is given a
 list object (all ufunc .reduce() methods do this). In both cases, you
 get a dtype=object array. This means that the ufunc will use the
 dtype=object inner loop, not the dtype=bool inner loop. For np.add,
 this isn't a problem. It just calls the __add__() method on the first
 object which, since it's an ndarray, calls np.add() again to do the
 actual work, this time using the appropriate dtype inner loop for the
 inner objects. But np.logical_and is different! For the dtype=object
 inner loop, it directly calls bool(x) on each item of the object
 array; it doesn't defer to any other method that might do the
 computation. bool(almost_any_ndarray) raises the ValueError that you
 saw. np.logical_and.reduce([x, y]) is not the same as
 np.logical_and(x, y). You can see how the dtype=object inner loop of
 np.logical_and() works by directly constructing dtype=object shape-()
 arrays:

 [~]
 |14 x
 array(None, dtype=object)

 [~]
 |15 x[()] = np.array([True, False])

 [~]
 |16 x
 array(array([ True, False], dtype=bool), dtype=object)

 [~]
 |17 y = np.array(None, dtype=object)

 [~]
 |18 y[()] = np.array([[True], [False]])

 [~]
 |19 y
 array(array([[ True],
[False]], dtype=bool), dtype=object)

 [~]
 |20 np.logical_and(x, y)
 ---
 ValueErrorTraceback (most recent call last)
 ipython-input-20-17705aa17a6f in module()
  1 np.logical_and(x, y)

 ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all()

 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Broadcasting with np.logical_and.reduce

2014-09-12 Thread Antony Lee
I read the Methods section of the ufuncs doc page (
http://docs.scipy.org/doc/numpy/reference/ufuncs.html#methods) again and I
think this could be made clearer simply by replacing the first sentence from

All ufuncs have four methods.

to

All ufuncs have five methods that operate on array-like objects. (yes,
there's also at, which seems to have been added later to the doc...)

This would make it somewhat clearer that

logical_and.reduce([array([True, False], dtype=bool), array([True],
dtype=bool)])

interprets the single list argument as an array-like (of dtype object)
rather than as an iterable over which to reduce (as python's builtin reduce
would).

In fact there is another point in that paragraph that could be improved;
namely axis does not have to be an integer for reduce.

Antony

2014-09-12 10:46 GMT-07:00 Robert Kern robert.k...@gmail.com:

 On Fri, Sep 12, 2014 at 5:46 PM, Robert Kern robert.k...@gmail.com
 wrote:
  On Fri, Sep 12, 2014 at 5:44 PM, Antony Lee antony@berkeley.edu
 wrote:
  I see.  I went back to the documentation of ufunc.reduce and this is not
  explicitly mentioned although a posteriori it makes sense; perhaps this
 can
  be made clearer there?
 
  Please recommend the documentation you would like to see.

 Specifically, the behavior I described is the interaction of several
 different things, but you don't mention which part of it is not
 explicitly mentioned.

 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Broadcasting with np.logical_and.reduce

2014-09-11 Thread Antony Lee
Hi,
I thought that ufunc.reduce performs broadcasting, but it seems a bit
confused by boolean arrays:

ipython with pylab mode on
In [1]: add.reduce([array([1, 2]), array([1])])
Out[1]: array([2, 3])
In [2]: logical_and.reduce([array([True, False], dtype=bool), array([True],
dtype=bool)])
---
ValueErrorTraceback (most recent call last)
ipython-input-2-bedbab4c13e1 in module()
 1 logical_and.reduce([array([True, False], dtype=bool), array([True],
dtype=bool)])

ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

Am I missing something here?

Thanks,
Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation

2013-09-20 Thread Antony Lee
Thanks a lot!
Antony


2013/9/20 Henry Gomersall h...@cantab.net

 On 18/09/13 01:51, Antony Lee wrote:
  While I realize that this is certainly tweaking multiprocessing beyond
  its specifications, I would like to use it on Windows to start a
  32-bit Python process from a 64-bit Python process (use case: I need
  to interface with a 64-bit DLL and use an extension (pyFFTW) for which
  I can only find a 32-bit compiled version (yes, I could try to install
  MSVC and compile it myself but I'm trying to avoid that...))

 There is now a release on PyPI including installers for both 32- and
 64-bit Python 2.7, 3.2 and 3.3.

 The long double schemes are ignored as on 64-bit windows that type
 simply maps to double (though it should be seamless from the
 Python/Numpy end).

 All tests satisfied :) (that was some work!)

 Cheers,

 Henry
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation

2013-09-19 Thread Antony Lee
2013/9/19 Robert Kern robert.k...@gmail.com

 On Thu, Sep 19, 2013 at 5:58 PM, Antony Lee antony@berkeley.edu
 wrote:
 
  Henry: thanks a lot, that would be very appreciated regardless of
 whether I end up using it in this specific project or not.
  Other replies below.
 
  Antony
 
  2013/9/19 Robert Kern robert.k...@gmail.com
 
  On Thu, Sep 19, 2013 at 2:40 AM, Antony Lee antony@berkeley.edu
 wrote:
  
   Thanks, I didn't know that multiprocessing Managers could be used
 with processes not started by multiprocessing itself...  I will give them a
 try.
   I just need to compute FFTs, but speed is a real issue for me (I am
 using the results for real-time feedback).
 
  I am pretty sure that the overhead of communicating a large array from
 one process to another will vastly overwhelm any speed gains you get by
 using pyFFTW over numpy.fft.
 
  I would have hoped that the large arrays are simply written (from the
 beginning) to shared memory (what multiprocessing.sharedctypes.Array seems
 to do(?)) and that interprocess communication would be cheap enough (but
 what do I know about that).

 It certainly won't be automatic just by passing a numpy array to the
 manager. You will have to manually create the shared memory, pass its
 handle to the other process, and copy into it. But even the copy of the
 array may overwhelm the speed gains between PyFFTW and numpy.fft. If you
 can set it up such that the subprocess owns the shared memory for both
 input and output and the GUI process always writes into the input shared
 array directly and reads out the output shared array, then might work out
 okay. This works well when the inputs/outputs are always the same size.

The arrays would always be the same size, and there is no array copy
involved, as (I think that) I can have the C dll directly write whatever
data needs to be analyzed to the shared memory array -- basically what
you're suggesting.


 --
 Robert Kern

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation

2013-09-19 Thread Antony Lee
Henry: thanks a lot, that would be very appreciated regardless of whether I
end up using it in this specific project or not.
Other replies below.

Antony

2013/9/19 Robert Kern robert.k...@gmail.com

 On Thu, Sep 19, 2013 at 2:40 AM, Antony Lee antony@berkeley.edu
 wrote:
 
  Thanks, I didn't know that multiprocessing Managers could be used with
 processes not started by multiprocessing itself...  I will give them a try.
  I just need to compute FFTs, but speed is a real issue for me (I am
 using the results for real-time feedback).

 I am pretty sure that the overhead of communicating a large array from one
 process to another will vastly overwhelm any speed gains you get by using
 pyFFTW over numpy.fft.

I would have hoped that the large arrays are simply written (from the
beginning) to shared memory (what multiprocessing.sharedctypes.Array seems
to do(?)) and that interprocess communication would be cheap enough (but
what do I know about that).



  To be honest I don't know yet if the FFTs are going to be the limiting
 step but I thought I may as well give pyFFTW a try and ran into that
 issue...

 In that case, thinking about multiprocessing or even pyFFTW is far too
 premature. Implement your code with numpy.fft and see what performance you
 actually get.

There is another (and, in fact, main) reason for me to use multiprocessing:
the main app runs a GUI and running the data analysis in the same process
just makes it painfully slow (I have tried that).  Instead, running the
data analysis in a separate process keeps the GUI responsive.  Now whether
the data analysis process should use numpy.fft or pyFFTW is a separate
question; I realize that the gains from pyFFTW may probably be negligible
compared to the other costs (... including the costs of tweaking
multiprocessing beyond its specifications) but I was just giving it a try
when I ran into the issue and was just puzzled by the error message I had
never seen before.

 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation

2013-09-18 Thread Antony Lee
Thanks, I didn't know that multiprocessing Managers could be used with
processes not started by multiprocessing itself...  I will give them a try.
I just need to compute FFTs, but speed is a real issue for me (I am using
the results for real-time feedback).  To be honest I don't know yet if the
FFTs are going to be the limiting step but I thought I may as well give
pyFFTW a try and ran into that issue...
Antony


2013/9/18 Robert Kern robert.k...@gmail.com

 On Wed, Sep 18, 2013 at 1:51 AM, Antony Lee antony@berkeley.edu
 wrote:
 
  Hi all,
 
  While I realize that this is certainly tweaking multiprocessing beyond
 its specifications, I would like to use it on Windows to start a 32-bit
 Python process from a 64-bit Python process (use case: I need to interface
 with a 64-bit DLL and use an extension (pyFFTW) for which I can only find a
 32-bit compiled version (yes, I could try to install MSVC and compile it
 myself but I'm trying to avoid that...))

 Just use subprocess to start up the 32-bit Python. If you want to use the
 multiprocessing tools for communicating data, use a Manager server in the
 32-bit Python to communicate over a socket.

   http://docs.python.org/2/library/multiprocessing#managers
   http://docs.python.org/2/library/multiprocessing#using-a-remote-manager

 It is possible that this won't work if the protocol assumes that the
 bitness is the same between server and client (e.g. struct.pack('Q', ...)),
 but I suspect this is not the case.

 You may also consider writing a small server using pyzmq or similar. I am
 guessing that you are just calling one function from pyFFTW and getting the
 result back. A simple REQ/REP server is easy to write with pyzmq. Do you
 need to use pyFFTW for some specific functionality that is not available in
 numpy.fft or scipy.fftpack?

 --
 Robert Kern

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation

2013-09-17 Thread Antony Lee
Hi all,

While I realize that this is certainly tweaking multiprocessing beyond its
specifications, I would like to use it on Windows to start a 32-bit Python
process from a 64-bit Python process (use case: I need to interface with a
64-bit DLL and use an extension (pyFFTW) for which I can only find a 32-bit
compiled version (yes, I could try to install MSVC and compile it myself
but I'm trying to avoid that...))

In fact, this is easy to do by using multiprocessing.set_executable
(...while that may not be its original role):

import multiprocessing as mp
import imp, site, sys
if 32 in sys.executable: # checking for my 32-bit Python install
del sys.path[1:] # recompute sys.path
print(sys.path)
site.main()
print(sys.path) # now points to the 32bit site-packages

import numpy

if __name__ == '__main__':
mp.set_executable(sys.executable.replace(33, 33-32)) # path of my
32-bit Python install
mp.Process(target=lambda: None).start()

The sys.path modifications are needed as otherwise the child process
inherits the parent's sys.path and importing numpy (from the 64-bit path)
fails as it is not a valid Win32 application, complains Python (rightly).

However, even after the sys.path modifications, the numpy import fails with
the error message (that I had never seen before):

sorry, I can't copy paste from the Windows command prompt...
from . import multiarray # - numpy/core/__init__.py, line 5
SystemError: initialization of multiarray raised an unreported exception

Any hints as to how this could be fixed would be most welcome.

Thanks in advance,

Antony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python3, genfromtxt and unicode

2012-05-01 Thread Antony Lee
Sure, I will.  Right now my solution is to use genfromtxt once with bytes
and auto-dtype detection, then modify the resulting dtype, replacing bytes
with unicodes, and use that new dtypes for a second round of genfromtxt.  A
bit awkward but that gets the job done.
Antony Lee

2012/5/1 Charles R Harris charlesr.har...@gmail.com



 On Fri, Apr 27, 2012 at 8:17 PM, Antony Lee antony@berkeley.eduwrote:

 With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to
 the largest number of chars (npyio.py line 1596), but it doesn't do the
 same for unicode fields, which is a pity.  See example below.
 I tried to change npyio.py around line 1600 to add that but it didn't
 work; from my limited understanding the problem comes earlier, in the way
 StringBuilder is defined(?).
 Antony Lee

 import io, numpy as np
 s = io.BytesIO()
 s.write(babc 1\ndef 2)
 s.seek(0)
 t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes})
 print(t, t.dtype) # - [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1',
 'i8')]
 s.seek(0)
 t = np.genfromtxt(s, dtype=None, converters={0: lambda s:
 s.decode(utf-8)})
 print(t, t.dtype) # - [('', 1) ('', 2)] [('f0', 'U0'), ('f1', 'i8')]


 Could you open a ticket for this?

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Python3, genfromtxt and unicode

2012-04-27 Thread Antony Lee
With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to
the largest number of chars (npyio.py line 1596), but it doesn't do the
same for unicode fields, which is a pity.  See example below.
I tried to change npyio.py around line 1600 to add that but it didn't work;
from my limited understanding the problem comes earlier, in the way
StringBuilder is defined(?).
Antony Lee

import io, numpy as np
s = io.BytesIO()
s.write(babc 1\ndef 2)
s.seek(0)
t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes})
print(t, t.dtype) # - [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', 'i8')]
s.seek(0)
t = np.genfromtxt(s, dtype=None, converters={0: lambda s:
s.decode(utf-8)})
print(t, t.dtype) # - [('', 1) ('', 2)] [('f0', 'U0'), ('f1', 'i8')]
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] unicode string for specifying dtype

2010-11-16 Thread Antony Lee
I just ran into the following:

 np.dtype(uf4)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: data type not understood

Is that the expected behaviour?

Thanks in advance,
Antony Lee
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion