[Numpy-discussion] Requesting a PR review for #5822
https://github.com/numpy/numpy/pull/5822 is a year-old PR which allows many random distributions to have a scale of exactly 0 (in which case a stream of zeros is returned of whatever constant value is appropriate). It passes all tests and has been sitting there for a while. Would a core dev be kind enough to have a look at it? Thanks! Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Changing the behavior of (builtins.)round (via the __round__ dunder) to return an integer
https://github.com/numpy/numpy/issues/3511 proposed (nearly three years ago) to return an integer when `builtins.round` (which calls the `__round__ dunder method, and thereafter called `round` (... not to be confused with `np.round`)) is called with a single argument. Currently, `round` returns a floating scalar for numpy scalars, matching the Python2 behavior. Python3 changed the behavior of `round` to return an int when it is called with a single argument (otherwise, the return type matches the type of the first argument). I believe this is more intuitive, and is arguably becoming more important now that numpy is deprecating (via a VisibleDeprecationWarning) indexing with a float: having to write array[int(round(some_float))] is rather awkward. (Note that I am suggesting to switch to the new behavior regardless of the version of Python.) Note that currently the `__round__` dunder is not implemented for arrays (... see https://github.com/numpy/numpy/issues/6248) so it would be feasible to always return a signed integer of the same size with an OverflowError on overflow (at least, any floating point that is round-able without loss of precision will be covered). If `__round__` ends up being implemented for ndarrays too, I guess the correct behavior will be whatever we come up for signaling failure in integer operations (see current behavior of `np.array([0, 1]) // np.array([0, 1])`). Also note the comment posted by @njsmith on the github issue thread: I'd be fine with matching python here, but we need to run it by the mailing list. Not clear what the right kind of deprecation is... Normally FutureWarning since there's no error involved, but that would both be very annoying (basically makes round unusable -- you get this noisy warning even if what you're doing is round(a).astype(int)), and the change is relatively low risk compared to most FutureWarning changes, since the actual values returned are identical before and after the change. Thoughts? Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Floor divison on int returns float
Whatever the C rules are (which I don't know off the top of my head, but I guess it must be one of uint64 or int64). It's not as if conversion to float64 was lossless: In [38]: 2**63 - (np.int64(2**62-1) + np.uint64(2**62-1)) Out[38]: 0.0 Note that the result of (np.int64(2**62-1) + np.uint64(2**62-1)) would actually fit in an int64 (or an uint64), so arguably the conversion to float makes things worse. Antony 2016-04-12 19:56 GMT-07:00 Nathaniel Smith <n...@pobox.com>: > So what type should uint64 + int64 return? > On Apr 12, 2016 7:17 PM, "Antony Lee" <antony@berkeley.edu> wrote: > >> This kind of issue (see also https://github.com/numpy/numpy/issues/3511) >> has become more annoying now that indexing requires integers (indexing with >> a float raises a VisibleDeprecationWarning). The argument "dividing an >> uint by an int may give a result that does not fit in an uint nor in an >> int" does not sound very convincing to me, after all even adding two >> (sized) ints may give a result that does not fit in the same size, but >> numpy does not upcast everything there: >> >> In [17]: np.int32(2**31 - 1) + np.int32(2**31 - 1) >> Out[17]: -2 >> >> In [18]: type(np.int32(2**31 - 1) + np.int32(2**31 - 1)) >> Out[18]: numpy.int32 >> >> >> I'd think that overflowing operations should just overflow (and possibly >> raise a warning via the seterr mechanism), but their possibility should not >> be an argument for modifying the output type. >> >> Antony >> >> 2016-04-12 17:57 GMT-07:00 T J <tjhn...@gmail.com>: >> >>> Thanks Eric. >>> >>> Also relevant: https://github.com/numba/numba/issues/909 >>> >>> Looks like Numba has found a way to avoid this edge case. >>> >>> >>> >>> On Monday, April 4, 2016, Eric Firing <efir...@hawaii.edu> wrote: >>> >>>> On 2016/04/04 9:23 AM, T J wrote: >>>> >>>>> I'm on NumPy 1.10.4 (mkl). >>>>> >>>>> >>> np.uint(3) // 2 # 1.0 >>>>> >>> 3 // 2 # 1 >>>>> >>>>> Is this behavior expected? It's certainly not desired from my >>>>> perspective. If this is not a bug, could someone explain the rationale >>>>> to me. >>>>> >>>>> Thanks. >>>>> >>>> >>>> I agree that it's almost always undesirable; one would reasonably >>>> expect some sort of int. Here's what I think is going on: >>>> >>>> The odd behavior occurs only with np.uint, which is np.uint64, and when >>>> the denominator is a signed int. The problem is that if the denominator is >>>> negative, the result will be negative, so it can't have the same type as >>>> the first numerator. Furthermore, if the denominator is -1, the result >>>> will be minus the numerator, and that can't be represented by np.uint or >>>> np.int. Therefore the result is returned as np.float64. The >>>> promotion rules are based on what *could* happen in an operation, not on >>>> what *is* happening in a given instance. >>>> >>>> Eric >>>> >>>> ___ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@scipy.org >>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Floor divison on int returns float
This kind of issue (see also https://github.com/numpy/numpy/issues/3511) has become more annoying now that indexing requires integers (indexing with a float raises a VisibleDeprecationWarning). The argument "dividing an uint by an int may give a result that does not fit in an uint nor in an int" does not sound very convincing to me, after all even adding two (sized) ints may give a result that does not fit in the same size, but numpy does not upcast everything there: In [17]: np.int32(2**31 - 1) + np.int32(2**31 - 1) Out[17]: -2 In [18]: type(np.int32(2**31 - 1) + np.int32(2**31 - 1)) Out[18]: numpy.int32 I'd think that overflowing operations should just overflow (and possibly raise a warning via the seterr mechanism), but their possibility should not be an argument for modifying the output type. Antony 2016-04-12 17:57 GMT-07:00 T J: > Thanks Eric. > > Also relevant: https://github.com/numba/numba/issues/909 > > Looks like Numba has found a way to avoid this edge case. > > > > On Monday, April 4, 2016, Eric Firing wrote: > >> On 2016/04/04 9:23 AM, T J wrote: >> >>> I'm on NumPy 1.10.4 (mkl). >>> >>> >>> np.uint(3) // 2 # 1.0 >>> >>> 3 // 2 # 1 >>> >>> Is this behavior expected? It's certainly not desired from my >>> perspective. If this is not a bug, could someone explain the rationale >>> to me. >>> >>> Thanks. >>> >> >> I agree that it's almost always undesirable; one would reasonably expect >> some sort of int. Here's what I think is going on: >> >> The odd behavior occurs only with np.uint, which is np.uint64, and when >> the denominator is a signed int. The problem is that if the denominator is >> negative, the result will be negative, so it can't have the same type as >> the first numerator. Furthermore, if the denominator is -1, the result >> will be minus the numerator, and that can't be represented by np.uint or >> np.int. Therefore the result is returned as np.float64. The promotion >> rules are based on what *could* happen in an operation, not on what *is* >> happening in a given instance. >> >> Eric >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
In a sense this discussion is really about making np.array(iterable) more efficient, so I restarted the discussion at https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075059.html Antony 2016-02-18 14:21 GMT-08:00 Chris Barker <chris.bar...@noaa.gov>: > On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee <antony@berkeley.edu> > wrote: > >> Mostly so that there is no performance lost when someone passes >> range(...) instead of np.arange(...). At least I had never realized that >> one is much faster than the other and always just passed range() as a >> convenience. >> > > Well, pretty much everything in numpy is faster if you use the numpy > array version rather than plain python -- this hardly seems like the extra > code would be worth it. > > numpy's array() constructor can (and should) take an arbitrary iterable. > > It does make some sense that you we might want to special case iterators, > as you don't want to loop through them too many times, which is what > np.fromiter() is for. > > and _maybe_ it would be worth special casing python lists, as you can > access items faster, and they are really, really common (or has this > already been done?), but special casing range() is getting silly. And it > might be hard to do. At the C level I suppose you could actually know what > the parameters and state of the range object are and create an array > directly from that -- but that's what arange is for... > > -CHB > > > >> 2016-02-17 10:50 GMT-08:00 Chris Barker <chris.bar...@noaa.gov>: >> >>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony@berkeley.edu> >>> wrote: >>> >>>> So how can np.array(range(...)) even work? >>>> >>> >>> range() (in py3) is not a generator, nor is is a iterator. it is a >>> range object, which is lazily evaluated, and satisfies both the iterator >>> protocol and the sequence protocol (at least most of it: >>> >>> In [*1*]: r = range(10) >>> >>> >>> In [*2*]: r[3] >>> >>> Out[*2*]: 3 >>> >>> >>> In [*3*]: len(r) >>> >>> Out[*3*]: 10 >>> >>> >>> In [*4*]: type(r) >>> >>> Out[*4*]: range >>> >>> In [*9*]: isinstance(r, collections.abc.Sequence) >>> >>> Out[*9*]: True >>> >>> In [*10*]: l = list() >>> >>> In [*11*]: isinstance(l, collections.abc.Sequence) >>> >>> Out[*11*]: True >>> >>> In [*12*]: isinstance(r, collections.abc.Iterable) >>> >>> Out[*12*]: True >>> I'm still totally confused as to why we'd need to special-case range >>> when we have arange(). >>> >>> -CHB >>> >>> >>> >>> -- >>> >>> Christopher Barker, Ph.D. >>> Oceanographer >>> >>> Emergency Response Division >>> NOAA/NOS/OR(206) 526-6959 voice >>> 7600 Sand Point Way NE (206) 526-6329 fax >>> Seattle, WA 98115 (206) 526-6317 main reception >>> >>> chris.bar...@noaa.gov >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
Mostly so that there is no performance lost when someone passes range(...) instead of np.arange(...). At least I had never realized that one is much faster than the other and always just passed range() as a convenience. Antony 2016-02-17 10:50 GMT-08:00 Chris Barker <chris.bar...@noaa.gov>: > On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony@berkeley.edu> > wrote: > >> So how can np.array(range(...)) even work? >> > > range() (in py3) is not a generator, nor is is a iterator. it is a range > object, which is lazily evaluated, and satisfies both the iterator protocol > and the sequence protocol (at least most of it: > > In [*1*]: r = range(10) > > > In [*2*]: r[3] > > Out[*2*]: 3 > > > In [*3*]: len(r) > > Out[*3*]: 10 > > > In [*4*]: type(r) > > Out[*4*]: range > > In [*9*]: isinstance(r, collections.abc.Sequence) > > Out[*9*]: True > > In [*10*]: l = list() > > In [*11*]: isinstance(l, collections.abc.Sequence) > > Out[*11*]: True > > In [*12*]: isinstance(r, collections.abc.Iterable) > > Out[*12*]: True > I'm still totally confused as to why we'd need to special-case range when > we have arange(). > > -CHB > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators
Actually, while working on https://github.com/numpy/numpy/issues/7264 I realized that the memory efficiency (one-pass) argument is simply incorrect: import numpy as np class A: def __getitem__(self, i): print("A get item", i) return [np.int8(1), np.int8(2)][i] def __len__(self): return 2 print(repr(np.array(A( This prints out A get item 0 A get item 1 A get item 2 A get item 0 A get item 1 A get item 2 A get item 0 A get item 1 A get item 2 array([1, 2], dtype=int8) i.e. the sequence is "turned into a concrete sequence" no less than 3 times. Antony 2016-01-19 11:33 GMT-08:00 Stephan Sahm: > just to not prevent it from the black hole - what about integrating > fromiter into array? (see the post by Benjamin Root) > > for me personally, taking the first element for deducing the dtype would > be a perfect default way to read generators. If one wants a specific other > dtype, one could specify it like in the current fromiter method. > > On 15 December 2015 at 08:08, Stephan Sahm wrote: > >> I would like to further push Benjamin Root's suggestion: >> >> "Therefore, I think it is not out of the realm of reason that passing a >> generator object and a dtype could then delegate the work under the hood to >> np.fromiter()? I would even go so far as to raise an error if one passes a >> generator without specifying dtype to np.array(). The point is to reduce >> the number of entry points for creating numpy arrays." >> >> would this be ok? >> >> On Mon, Dec 14, 2015 at 6:50 PM Robert Kern >> wrote: >> >>> On Mon, Dec 14, 2015 at 5:41 PM, Benjamin Root >>> wrote: >>> > >>> > Heh, never noticed that. Was it implemented more like a >>> generator/iterator in older versions of Python? >>> >>> No, it predates generators and iterators so it has always had to be >>> implemented like that. >>> >>> -- >>> Robert Kern >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`
See earlier discussion here: https://github.com/numpy/numpy/issues/6326 Basically, naïvely sorting may be faster than a not-so-optimized version of quickselect. Antony 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz: > I would like to add a `weights` keyword to `np.partition`, > `np.percentile` and `np.median`. My reason for doing so is to to allow > `np.histogram` to process automatic bin selection with weights. > Currently, weights are not supported for the automatic bin selection > and would be difficult to support in `auto` mode without having > `np.percentile` support a `weights` keyword. I suspect that there are > many other uses for such a feature. > > I have taken a preliminary look at the C implementation of the > partition functions that are the basis for `partition`, `median` and > `percentile`. I think that it would be possible to add versions (or > just extend the functionality of existing ones) that check the ratio > of the weights below the partition point to the total sum of the > weights instead of just counting elements. > > One of the main advantages of such an implementation is that it would > allow any real weights to be handled correctly, not just integers. > Complex weights would not be supported. > > The purpose of this email is to see if anybody objects, has ideas or > cares at all about this proposal before I spend a significant amount > of time working on it. For example, did I miss any functions in my > list? > > Regards, > > -Joe > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
Indeed: In [1]: class C: def __getitem__(self, i): if i < 10: return i else: raise IndexError def __len__(self): return 10 ...: In [2]: np.array(C()) Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) (omitting __len__ results in the creation of an object array, consistently with the fact that the sequence protocol requires __len__). Meanwhile, I found a new way to segfault numpy :-) In [3]: class C: def __getitem__(self, i): if i < 10: return i else: raise IndexError def __len__(self): return 42 ...: In [4]: np.array(C()) Fatal Python error: Segmentation fault 2016-02-15 0:10 GMT-08:00 Nathaniel Smith <n...@pobox.com>: > On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee <antony@berkeley.edu> > wrote: > > I wonder whether numpy is using the "old" iteration protocol (repeatedly > > calling x[i] for increasing i until StopIteration is reached?) A quick > > timing shows that it is indeed slower. > > Yeah, I'm pretty sure that np.array doesn't know anything about > "iterable", just about "sequence" (calling x[i] for 0 <= i < > i.__len__()). > > (See Sequence vs Iterable: > https://docs.python.org/3/library/collections.abc.html) > > Personally I'd like it if we could eventually make it so np.array > specifically looks for lists and only lists, because the way it has so > many different fallbacks right now creates all confusion between which > objects are elements. Compare: > > In [5]: np.array([(1, 2), (3, 4)]).shape > Out[5]: (2, 2) > > In [6]: np.array([(1, 2), (3, 4)], dtype="i4,i4").shape > Out[6]: (2,) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
I wonder whether numpy is using the "old" iteration protocol (repeatedly calling x[i] for increasing i until StopIteration is reached?) A quick timing shows that it is indeed slower. ... actually it's not even clear to me what qualifies as a sequence for `np.array`: class C: def __iter__(self): return iter(range(10)) # [0... 9] under the new iteration protocol def __getitem__(self, i): raise IndexError # [] under the old iteration protocol np.array(C()) ===> array(<__main__.C object at 0x7f3f2128>, dtype=object) So how can np.array(range(...)) even work? 2016-02-14 22:21 GMT-08:00 Ralf Gommers <ralf.gomm...@gmail.com>: > > > On Sun, Feb 14, 2016 at 10:36 PM, Charles R Harris < > charlesr.har...@gmail.com> wrote: > >> >> >> On Sun, Feb 14, 2016 at 7:36 AM, Ralf Gommers <ralf.gomm...@gmail.com> >> wrote: >> >>> >>> >>> On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony@berkeley.edu> >>> wrote: >>> >>>> re: no reason why... >>>> This has nothing to do with Python2/Python3 (I personally stopped using >>>> Python2 at least 3 years ago.) Let me put it this way instead: if >>>> Python3's "range" (or Python2's "xrange") was not a builtin type but a type >>>> provided by numpy, I don't think it would be controversial at all to >>>> provide an `__array__` special method to efficiently convert it to a >>>> ndarray. It would be the same if `np.array` used a >>>> `functools.singledispatch` dispatcher rather than an `__array__` special >>>> method (which is obviously not possible for chronological reasons). >>>> >>>> re: iterable vs iterator: check for the presence of the __next__ >>>> special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and >>>> not isinstance(x, Iterable)) >>>> >>> >>> I think it's good to do something about this, but it's not clear what >>> the exact proposal is. I could image one or both of: >>> >>> - special-case the range() object in array (and asarray/asanyarray?) >>> such that array(range(N)) becomes as fast as arange(N). >>> - special-case all iterators, such that array(range(N)) becomes as >>> fast as deque(range(N)) >>> >> >> I think the last wouldn't help much, as numpy would still need to >> determine dimensions and type. I assume that is one of the reason sparse >> itself doesn't do that. >> > > Not orders of magnitude, but this shows that there's something to optimize > for iterators: > > In [1]: %timeit np.array(range(10)) > 100 loops, best of 3: 14.9 ms per loop > > In [2]: %timeit np.array(list(range(10))) > 100 loops, best of 3: 9.68 ms per loop > > Ralf > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
I was thinking (1) (special-case range()); however (2) may be more generally applicable and useful. Antony 2016-02-14 6:36 GMT-08:00 Ralf Gommers <ralf.gomm...@gmail.com>: > > > On Sun, Feb 14, 2016 at 9:21 AM, Antony Lee <antony@berkeley.edu> > wrote: > >> re: no reason why... >> This has nothing to do with Python2/Python3 (I personally stopped using >> Python2 at least 3 years ago.) Let me put it this way instead: if >> Python3's "range" (or Python2's "xrange") was not a builtin type but a type >> provided by numpy, I don't think it would be controversial at all to >> provide an `__array__` special method to efficiently convert it to a >> ndarray. It would be the same if `np.array` used a >> `functools.singledispatch` dispatcher rather than an `__array__` special >> method (which is obviously not possible for chronological reasons). >> >> re: iterable vs iterator: check for the presence of the __next__ special >> method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not >> isinstance(x, Iterable)) >> > > I think it's good to do something about this, but it's not clear what the > exact proposal is. I could image one or both of: > > - special-case the range() object in array (and asarray/asanyarray?) > such that array(range(N)) becomes as fast as arange(N). > - special-case all iterators, such that array(range(N)) becomes as fast > as deque(range(N)) > > or yet something else? > > Ralf > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
re: no reason why... This has nothing to do with Python2/Python3 (I personally stopped using Python2 at least 3 years ago.) Let me put it this way instead: if Python3's "range" (or Python2's "xrange") was not a builtin type but a type provided by numpy, I don't think it would be controversial at all to provide an `__array__` special method to efficiently convert it to a ndarray. It would be the same if `np.array` used a `functools.singledispatch` dispatcher rather than an `__array__` special method (which is obviously not possible for chronological reasons). re: iterable vs iterator: check for the presence of the __next__ special method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not isinstance(x, Iterable)) Antony 2016-02-13 18:48 GMT-08:00 <josef.p...@gmail.com>: > > > On Sat, Feb 13, 2016 at 9:43 PM, <josef.p...@gmail.com> wrote: > >> >> >> On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee <antony@berkeley.edu> >> wrote: >> >>> Compare (on Python3 -- for Python2, read "xrange" instead of "range"): >>> >>> In [2]: %timeit np.array(range(100), np.int64) >>> 10 loops, best of 3: 156 ms per loop >>> >>> In [3]: %timeit np.arange(100, dtype=np.int64) >>> 1000 loops, best of 3: 853 µs per loop >>> >>> >>> Note that while iterating over a range is not very fast, it is still >>> much better than the array creation: >>> >>> In [4]: from collections import deque >>> >>> In [5]: %timeit deque(range(100), 1) >>> 10 loops, best of 3: 25.5 ms per loop >>> >>> >>> On one hand, special cases are awful. On the other hand, the range >>> builtin is probably important enough to deserve a special case to make this >>> construction faster. Or not? I initially opened this as >>> https://github.com/numpy/numpy/issues/7233 but it was suggested there >>> that this should be discussed on the ML first. >>> >>> (The real issue which prompted this suggestion: I was building sparse >>> matrices using scipy.sparse.csc_matrix with some indices specified using >>> range, and that construction step turned out to take a significant portion >>> of the time because of the calls to np.array). >>> >> >> >> IMO: I don't see a reason why this should be supported. There is >> np.arange after all for this usecase, and from_iter. >> range and the other guys are iterators, and in several cases we can use >> larange = list(range(...)) as a short cut to get python list.for python 2/3 >> compatibility. >> >> I think this might be partially a learning effect in the python 2 to 3 >> transition. After using almost only python 3 for maybe a year, I don't >> think it's difficult to remember the differences when writing code that is >> py 2.7 and py 3.x compatible. >> >> >> It's just **another** thing to watch out for if milliseconds matter in >> your application. >> > > > side question: Is there a simple way to distinguish a iterator or > generator from an iterable data structure? > > Josef > > > >> >> Josef >> >> >>> >>> Antony >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
Compare (on Python3 -- for Python2, read "xrange" instead of "range"): In [2]: %timeit np.array(range(100), np.int64) 10 loops, best of 3: 156 ms per loop In [3]: %timeit np.arange(100, dtype=np.int64) 1000 loops, best of 3: 853 µs per loop Note that while iterating over a range is not very fast, it is still much better than the array creation: In [4]: from collections import deque In [5]: %timeit deque(range(100), 1) 10 loops, best of 3: 25.5 ms per loop On one hand, special cases are awful. On the other hand, the range builtin is probably important enough to deserve a special case to make this construction faster. Or not? I initially opened this as https://github.com/numpy/numpy/issues/7233 but it was suggested there that this should be discussed on the ML first. (The real issue which prompted this suggestion: I was building sparse matrices using scipy.sparse.csc_matrix with some indices specified using range, and that construction step turned out to take a significant portion of the time because of the calls to np.array). Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Fixing the dtype of np.full's return value
Hi all, The docstring of np.full indicates that the result of the dtype is `np.array(fill_value).dtype`, as long as the keyword argument `dtype` itself is not set. This is actually not the case: the current implementation always returns a float array when `dtype` is not set, see e.g. In [1]: np.full(1, 1) Out[1]: array([ 1.]) In [2]: np.full(1, None) Out[2]: array([ nan]) In [3]: np.full(1, None).dtype Out[3]: dtype('float64') In [4]: np.array(None) Out[4]: array(None, dtype=object) The note about return value of the dtype was actually explicitly discussed in https://github.com/numpy/numpy/pull/2875 but the tests failed to cover the case where the `dtype` argument is not passed. We could either change the docstring to match the current behavior, or fix the behavior to match what the docstring says (my preference). @njsmith mentioned in https://github.com/numpy/numpy/issues/6366 that this may be acceptable as a bug fix, as "it's a very new function so there probably aren't many people relying on it" (it was introduced in 1.8). I guess the options are: - Fix the behavior outright and squeeze this in 1.10 as a bugfix (my preference). - Emit a warning in 1.10, fix in 1.11. - Do nothing for 1.10, warn in 1.11, fix in 1.12 (at that point the argument of `np.full` being a very new function starts becoming invalid...). - Change the docstring. Thoughts? Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState
2015-05-29 14:06 GMT-07:00 Antony Lee antony@berkeley.edu: A proof-of-concept implementation, still missing tests, is tracked as #5911. It includes the patch proposed in #5158 as an example of how to include an improved version of random.choice. Tests are in now (whether we should bundle in pickles of old versions to make sure they are still unpickled correctly and outputs of old random streams to make sure they are still reproduced is a good question, though). Comments welcome. Kindly bumping the issue. Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState
A proof-of-concept implementation, still missing tests, is tracked as #5911. It includes the patch proposed in #5158 as an example of how to include an improved version of random.choice. Tests are in now (whether we should bundle in pickles of old versions to make sure they are still unpickled correctly and outputs of old random streams to make sure they are still reproduced is a good question, though). Comments welcome. Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState
Thanks to Nathaniel who has indeed clarified my intent, i.e. the global RandomState should use the latest implementation, unless explicitly seeded. More generally, the `RandomState` constructor is just a thin wrapper around `seed` with the same signature, so one can swap the version of the global functions with a call to `np.random.seed(version=...)`. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState
2015-05-24 13:30 GMT-07:00 Sturla Molden sturla.mol...@gmail.com: On 24/05/15 10:22, Antony Lee wrote: Comments, and help for writing tests (in particular to make sure backwards compatibility is maintained) are welcome. I have one comment, and that is what makes random numbers so special? This applies to the rest of NumPy too, fixing a bug can sometimes change the output of a function. Personally I think we should only make guarantees about the data types, array shapes, and things like that, but not about the values. Those who need a particular version of NumPy for exact reproducibility should install the version of Python and NumPy they need. That is why virtual environments exist. I personally agree with this point of view (see original discussion in #5299, for example); if it was only up to me at least I'd make RandomState(seed) default to the latest version rather than the original one (whether to keep the old versions around is another question). On the other hand, I see that this long-standing debate has prevented obvious improvements from being added sometimes for years (e.g. a patch for Ziggurat normal variates has been lying around since 2010), or led to potential API duplication in order to fix some clearly undesirable behavior (dirichlet returning nan being described as in a strict sense not really a bug(!)), so I'm willing to compromise to get this moving forward. Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState
Hi, As mentioned in #1450: Patch with Ziggurat method for Normal distribution #5158: ENH: More efficient algorithm for unweighted random choice without replacement #5299: using `random.choice` to sample integers in a large range #5851: Bug in np.random.dirichlet for small alpha parameters some methods on np.random.RandomState are implemented either non-optimally (#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily changed due to backwards compatibility concerns. While some have suggested new methods deprecating the old ones (see e.g. #5872), some consensus has formed around the following ideas (see #5299 for original discussion, followed by private discussions with @njsmith): - Backwards compatibility should only be provided to those who were explicitly instantiating a seeded RandomState object or reseeding a RandomState object to a given value, and drawing variates from it: using the global methods (or a None-seeded RandomState) was already non-reproducible anyways as e.g. other libraries could be drawing variates from the global RandomState (of which the free functions in np.random are actually methods). Thus, the global RandomState object should use the latest implementation of the methods. - RandomState(seed) and r = RandomState(...); r.seed(seed) should offer backwards-compatibility guarantees (see e.g. https://docs.python.org/3.4/library/random.html#notes-on-reproducibility). As such, we propose the following improvements to the API: - RandomState gains a (keyword-only) parameter, version, also accessible as a read-only attribute. This indicates the version of the methods on the object. The current version of RandomState is retroactively assigned version 0. The latest available version is available as np.random.LATEST_VERSION. Backwards-incompatible improvements to RandomState methods can be introduced but increase the LAGTEST_VERSION. - The global RandomState is instantiated as RandomState(version=LATEST_VERSION). - RandomState() and rs.seed() sets the version to LATEST_VERSION. - RandomState(seed[!=None]) and rs.seed(seed[!=None]) sets the version to 0. A proof-of-concept implementation, still missing tests, is tracked as #5911. It includes the patch proposed in #5158 as an example of how to include an improved version of random.choice. Comments, and help for writing tests (in particular to make sure backwards compatibility is maintained) are welcome. Antony Lee ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
Another improvement would be to make sure, for integer-valued datasets, that all bins cover the same number of integer, as it is easy to end up otherwise with bins effectively wider than others: hist(np.random.randint(11, size=1)) shows a peak in the last bin, as it covers both 9 and 10. Antony 2015-04-13 5:02 GMT-07:00 Neil Girdhar mistersh...@gmail.com: Can I suggest that we instead add the P-square algorithm for the dynamic calculation of histograms? ( http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf ) This is already implemented in C++'s boost library ( http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp ) I implemented it in Boost Python as a module, which I'm happy to share. This is much better than fixed-width histograms in practice. Rather than adjusting the number of bins, it adjusts what you really want, which is the resolution of the bins throughout the domain. Best, Neil On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote: http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to constantly play around with the number of bins to get a good idea of what I'm looking at. The bins=10 works ok for up to 1000 points or very normal data, but has poor performance for anything else, and doesn't account for variability either. I don't have a method easily available to scale the number of bins given the data. R doesn't suffer from these problems and provides methods for use with it's hist method. I would like to provide similar functionality for matplotlib, to at least provide some kind of good starting point, as histograms are very useful for initial data discovery. The notebook above provides an explanation of the problem as well as some proposed alternatives. Use different datasets (type and size) to see the performance of the suggestions. All of the methods proposed exist in R and literature. I've put together an implementation to add this new functionality, but am hesitant to make a pull request as I would like some feedback from a maintainer before doing so. +1 on the PR. +1 as well. Unfortunately we can't change the default of 10, but a number of string methods, with a bins=auto or some such name prominently recommended in the docstring, would be very good to have. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] edge-cases of ellipsis indexing
I see, thanks! 2015-01-05 2:14 GMT-07:00 Sebastian Berg sebast...@sipsolutions.net: On Mo, 2015-01-05 at 14:13 +0530, Maniteja Nandana wrote: Hi Anthony, I am not sure whether the following section in documentation is relevant to the behavior you were referring to. When an ellipsis (...) is present but has no size (i.e. replaces zero :) the result will still always be an array. A view if no advanced index is present, otherwise a copy. Exactly. There are actually three forms of indexing to distinguish. 1. Indexing with integers (also scalar arrays) matching the number of dimensions. This will return a *scalar*. 2. Slicing, etc. which returns a view. This also occurs as soon there is an ellipsis in there (even if it replaces 0 `:`). You should see it as a feature to get a view if the result might be a scalar otherwise ;)! 3. Advanced indexing which cannot be view based and returns a copy. - Sebastian Here, ...replaces zero : Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view). And I think it is a view that is returned in this case. a = array([1]) a array([1]) a[:,0] # zero : are present Traceback (most recent call last): File stdin, line 1, in module IndexError: too many indices for array a[...,0]=2 a array([2]) a[0] = 3 a array([3]) a[(0,)] = 4 a array([4]) a[: array([1]) Hope I helped. Cheers, N.Maniteja. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] edge-cases of ellipsis indexing
While trying to reproduce various fancy indexings for astropy's FITS sections (a loaded-on-demand array), I found the following interesting behavior: np.array([1])[..., 0] array(1) np.array([1])[0] 1 np.array([1])[(0,)] 1 The docs say Ellipsis expand to the number of : objects needed to make a selection tuple of the same length as x.ndim., so it's not totally clear to me how to explain that difference in the results. Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] truthiness of object arrays
On Python3, __nonzero__ is never defined (always raises an AttributeError), even after calling __bool__. 2014-11-13 5:24 GMT-08:00 Alan G Isaac alan.is...@gmail.com: On 11/13/2014 1:19 AM, Antony Lee wrote: t.__bool__() also returns True But t.__nonzero__() is being called in the `if` test. The question is: is the difference between `__nonzero__` and `__bool__` intentional. By the way, there has been a change in behavior. For example, in 1.7.1 if you call `t.__bool__()` it raised an attribute error -- unless one first called `t.__nonzero__()` and then called `t.__bool__()`, which was of course very weird and needed to be fixed. Maybe (?) not like this. In fact the oddity probably remains but moved. in 1.9.0 I see this: np.__version__ '1.9.0' t = np.array(None); t[()] = np.array([None, None]) t.__nonzero__() Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'numpy.ndarray' object has no attribute '__nonzero__' t.__bool__() True t.__nonzero__() ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] truthiness of object arrays
Dunno, seems unlikely that something changed with Python 3.4.2... $ python --version Python 3.4.2 $ python -c 'import numpy as np; print(np.__version__); t = np.array(None); t[()] = np.array([None, None]); t.__bool__(); t.__nonzero__()' 1.9.0 Traceback (most recent call last): File string, line 1, in module AttributeError: 'numpy.ndarray' object has no attribute '__nonzero__' 2014-11-13 10:05 GMT-08:00 Alan G Isaac alan.is...@gmail.com: On 11/13/2014 12:37 PM, Antony Lee wrote: On Python3, __nonzero__ is never defined (always raises an AttributeError), even after calling __bool__. The example I posted was Python 3.4.1 with numpy 1.9.0. fwiw, Alan Isaac Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import numpy as np np.__version__ '1.9.0' t = np.array(None); t[()] = np.array([None, None]) t.__nonzero__() Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'numpy.ndarray' object has no attribute '__nonzero__' t.__bool__() True t.__nonzero__() ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] truthiness of object arrays
I am puzzled by the following (numpy 1.9.0, python 3.4.2): In [1]: t = array(None); t[()] = array([None, None]) # Construct a 0d array of dtype object, containing a single numpy array with 2 elements In [2]: bool(t) Out[2]: True In [3]: if t: pass --- ValueErrorTraceback (most recent call last) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I thought that if x simply calls bool, but apparently this is not even the case... Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Broadcasting with np.logical_and.reduce
I am not using asarray here. Sorry, but I don't see how this is relevant -- my comparison with np.add.reduce is simply that when a list of float arrays is passed to np.add.reduce, broadcasting happens as usual, but not when a list of bool arrays is passed to np.logical_and.reduce. 2014-09-12 0:48 GMT-07:00 Sebastian Berg sebast...@sipsolutions.net: On Do, 2014-09-11 at 22:54 -0700, Antony Lee wrote: Hi, I thought that ufunc.reduce performs broadcasting, but it seems a bit confused by boolean arrays: ipython with pylab mode on In [1]: add.reduce([array([1, 2]), array([1])]) Out[1]: array([2, 3]) In [2]: logical_and.reduce([array([True, False], dtype=bool), array([True], dtype=bool)]) --- ValueErrorTraceback (most recent call last) ipython-input-2-bedbab4c13e1 in module() 1 logical_and.reduce([array([True, False], dtype=bool), array([True], dtype=bool)]) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Am I missing something here? `np.asarray([array([1, 2]), array([1])])` is an object array, not a boolean array. You probably want to concatenate them. - Sebastian Thanks, Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Broadcasting with np.logical_and.reduce
I see. I went back to the documentation of ufunc.reduce and this is not explicitly mentioned although a posteriori it makes sense; perhaps this can be made clearer there? Antony 2014-09-12 2:22 GMT-07:00 Robert Kern robert.k...@gmail.com: On Fri, Sep 12, 2014 at 10:04 AM, Antony Lee antony@berkeley.edu wrote: I am not using asarray here. Sorry, but I don't see how this is relevant -- my comparison with np.add.reduce is simply that when a list of float arrays is passed to np.add.reduce, broadcasting happens as usual, but not when a list of bool arrays is passed to np.logical_and.reduce. But np.logical_and.reduce() *does* use asarray() when it is given a list object (all ufunc .reduce() methods do this). In both cases, you get a dtype=object array. This means that the ufunc will use the dtype=object inner loop, not the dtype=bool inner loop. For np.add, this isn't a problem. It just calls the __add__() method on the first object which, since it's an ndarray, calls np.add() again to do the actual work, this time using the appropriate dtype inner loop for the inner objects. But np.logical_and is different! For the dtype=object inner loop, it directly calls bool(x) on each item of the object array; it doesn't defer to any other method that might do the computation. bool(almost_any_ndarray) raises the ValueError that you saw. np.logical_and.reduce([x, y]) is not the same as np.logical_and(x, y). You can see how the dtype=object inner loop of np.logical_and() works by directly constructing dtype=object shape-() arrays: [~] |14 x array(None, dtype=object) [~] |15 x[()] = np.array([True, False]) [~] |16 x array(array([ True, False], dtype=bool), dtype=object) [~] |17 y = np.array(None, dtype=object) [~] |18 y[()] = np.array([[True], [False]]) [~] |19 y array(array([[ True], [False]], dtype=bool), dtype=object) [~] |20 np.logical_and(x, y) --- ValueErrorTraceback (most recent call last) ipython-input-20-17705aa17a6f in module() 1 np.logical_and(x, y) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Broadcasting with np.logical_and.reduce
I read the Methods section of the ufuncs doc page ( http://docs.scipy.org/doc/numpy/reference/ufuncs.html#methods) again and I think this could be made clearer simply by replacing the first sentence from All ufuncs have four methods. to All ufuncs have five methods that operate on array-like objects. (yes, there's also at, which seems to have been added later to the doc...) This would make it somewhat clearer that logical_and.reduce([array([True, False], dtype=bool), array([True], dtype=bool)]) interprets the single list argument as an array-like (of dtype object) rather than as an iterable over which to reduce (as python's builtin reduce would). In fact there is another point in that paragraph that could be improved; namely axis does not have to be an integer for reduce. Antony 2014-09-12 10:46 GMT-07:00 Robert Kern robert.k...@gmail.com: On Fri, Sep 12, 2014 at 5:46 PM, Robert Kern robert.k...@gmail.com wrote: On Fri, Sep 12, 2014 at 5:44 PM, Antony Lee antony@berkeley.edu wrote: I see. I went back to the documentation of ufunc.reduce and this is not explicitly mentioned although a posteriori it makes sense; perhaps this can be made clearer there? Please recommend the documentation you would like to see. Specifically, the behavior I described is the interaction of several different things, but you don't mention which part of it is not explicitly mentioned. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Broadcasting with np.logical_and.reduce
Hi, I thought that ufunc.reduce performs broadcasting, but it seems a bit confused by boolean arrays: ipython with pylab mode on In [1]: add.reduce([array([1, 2]), array([1])]) Out[1]: array([2, 3]) In [2]: logical_and.reduce([array([True, False], dtype=bool), array([True], dtype=bool)]) --- ValueErrorTraceback (most recent call last) ipython-input-2-bedbab4c13e1 in module() 1 logical_and.reduce([array([True, False], dtype=bool), array([True], dtype=bool)]) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Am I missing something here? Thanks, Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation
Thanks a lot! Antony 2013/9/20 Henry Gomersall h...@cantab.net On 18/09/13 01:51, Antony Lee wrote: While I realize that this is certainly tweaking multiprocessing beyond its specifications, I would like to use it on Windows to start a 32-bit Python process from a 64-bit Python process (use case: I need to interface with a 64-bit DLL and use an extension (pyFFTW) for which I can only find a 32-bit compiled version (yes, I could try to install MSVC and compile it myself but I'm trying to avoid that...)) There is now a release on PyPI including installers for both 32- and 64-bit Python 2.7, 3.2 and 3.3. The long double schemes are ignored as on 64-bit windows that type simply maps to double (though it should be seamless from the Python/Numpy end). All tests satisfied :) (that was some work!) Cheers, Henry ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation
2013/9/19 Robert Kern robert.k...@gmail.com On Thu, Sep 19, 2013 at 5:58 PM, Antony Lee antony@berkeley.edu wrote: Henry: thanks a lot, that would be very appreciated regardless of whether I end up using it in this specific project or not. Other replies below. Antony 2013/9/19 Robert Kern robert.k...@gmail.com On Thu, Sep 19, 2013 at 2:40 AM, Antony Lee antony@berkeley.edu wrote: Thanks, I didn't know that multiprocessing Managers could be used with processes not started by multiprocessing itself... I will give them a try. I just need to compute FFTs, but speed is a real issue for me (I am using the results for real-time feedback). I am pretty sure that the overhead of communicating a large array from one process to another will vastly overwhelm any speed gains you get by using pyFFTW over numpy.fft. I would have hoped that the large arrays are simply written (from the beginning) to shared memory (what multiprocessing.sharedctypes.Array seems to do(?)) and that interprocess communication would be cheap enough (but what do I know about that). It certainly won't be automatic just by passing a numpy array to the manager. You will have to manually create the shared memory, pass its handle to the other process, and copy into it. But even the copy of the array may overwhelm the speed gains between PyFFTW and numpy.fft. If you can set it up such that the subprocess owns the shared memory for both input and output and the GUI process always writes into the input shared array directly and reads out the output shared array, then might work out okay. This works well when the inputs/outputs are always the same size. The arrays would always be the same size, and there is no array copy involved, as (I think that) I can have the C dll directly write whatever data needs to be analyzed to the shared memory array -- basically what you're suggesting. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation
Henry: thanks a lot, that would be very appreciated regardless of whether I end up using it in this specific project or not. Other replies below. Antony 2013/9/19 Robert Kern robert.k...@gmail.com On Thu, Sep 19, 2013 at 2:40 AM, Antony Lee antony@berkeley.edu wrote: Thanks, I didn't know that multiprocessing Managers could be used with processes not started by multiprocessing itself... I will give them a try. I just need to compute FFTs, but speed is a real issue for me (I am using the results for real-time feedback). I am pretty sure that the overhead of communicating a large array from one process to another will vastly overwhelm any speed gains you get by using pyFFTW over numpy.fft. I would have hoped that the large arrays are simply written (from the beginning) to shared memory (what multiprocessing.sharedctypes.Array seems to do(?)) and that interprocess communication would be cheap enough (but what do I know about that). To be honest I don't know yet if the FFTs are going to be the limiting step but I thought I may as well give pyFFTW a try and ran into that issue... In that case, thinking about multiprocessing or even pyFFTW is far too premature. Implement your code with numpy.fft and see what performance you actually get. There is another (and, in fact, main) reason for me to use multiprocessing: the main app runs a GUI and running the data analysis in the same process just makes it painfully slow (I have tried that). Instead, running the data analysis in a separate process keeps the GUI responsive. Now whether the data analysis process should use numpy.fft or pyFFTW is a separate question; I realize that the gains from pyFFTW may probably be negligible compared to the other costs (... including the costs of tweaking multiprocessing beyond its specifications) but I was just giving it a try when I ran into the issue and was just puzzled by the error message I had never seen before. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation
Thanks, I didn't know that multiprocessing Managers could be used with processes not started by multiprocessing itself... I will give them a try. I just need to compute FFTs, but speed is a real issue for me (I am using the results for real-time feedback). To be honest I don't know yet if the FFTs are going to be the limiting step but I thought I may as well give pyFFTW a try and ran into that issue... Antony 2013/9/18 Robert Kern robert.k...@gmail.com On Wed, Sep 18, 2013 at 1:51 AM, Antony Lee antony@berkeley.edu wrote: Hi all, While I realize that this is certainly tweaking multiprocessing beyond its specifications, I would like to use it on Windows to start a 32-bit Python process from a 64-bit Python process (use case: I need to interface with a 64-bit DLL and use an extension (pyFFTW) for which I can only find a 32-bit compiled version (yes, I could try to install MSVC and compile it myself but I'm trying to avoid that...)) Just use subprocess to start up the 32-bit Python. If you want to use the multiprocessing tools for communicating data, use a Manager server in the 32-bit Python to communicate over a socket. http://docs.python.org/2/library/multiprocessing#managers http://docs.python.org/2/library/multiprocessing#using-a-remote-manager It is possible that this won't work if the protocol assumes that the bitness is the same between server and client (e.g. struct.pack('Q', ...)), but I suspect this is not the case. You may also consider writing a small server using pyzmq or similar. I am guessing that you are just calling one function from pyFFTW and getting the result back. A simple REQ/REP server is easy to write with pyzmq. Do you need to use pyFFTW for some specific functionality that is not available in numpy.fft or scipy.fftpack? -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] multiprocessing, numpy and 32-64 bit cohabitation
Hi all, While I realize that this is certainly tweaking multiprocessing beyond its specifications, I would like to use it on Windows to start a 32-bit Python process from a 64-bit Python process (use case: I need to interface with a 64-bit DLL and use an extension (pyFFTW) for which I can only find a 32-bit compiled version (yes, I could try to install MSVC and compile it myself but I'm trying to avoid that...)) In fact, this is easy to do by using multiprocessing.set_executable (...while that may not be its original role): import multiprocessing as mp import imp, site, sys if 32 in sys.executable: # checking for my 32-bit Python install del sys.path[1:] # recompute sys.path print(sys.path) site.main() print(sys.path) # now points to the 32bit site-packages import numpy if __name__ == '__main__': mp.set_executable(sys.executable.replace(33, 33-32)) # path of my 32-bit Python install mp.Process(target=lambda: None).start() The sys.path modifications are needed as otherwise the child process inherits the parent's sys.path and importing numpy (from the 64-bit path) fails as it is not a valid Win32 application, complains Python (rightly). However, even after the sys.path modifications, the numpy import fails with the error message (that I had never seen before): sorry, I can't copy paste from the Windows command prompt... from . import multiarray # - numpy/core/__init__.py, line 5 SystemError: initialization of multiarray raised an unreported exception Any hints as to how this could be fixed would be most welcome. Thanks in advance, Antony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Python3, genfromtxt and unicode
Sure, I will. Right now my solution is to use genfromtxt once with bytes and auto-dtype detection, then modify the resulting dtype, replacing bytes with unicodes, and use that new dtypes for a second round of genfromtxt. A bit awkward but that gets the job done. Antony Lee 2012/5/1 Charles R Harris charlesr.har...@gmail.com On Fri, Apr 27, 2012 at 8:17 PM, Antony Lee antony@berkeley.eduwrote: With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to the largest number of chars (npyio.py line 1596), but it doesn't do the same for unicode fields, which is a pity. See example below. I tried to change npyio.py around line 1600 to add that but it didn't work; from my limited understanding the problem comes earlier, in the way StringBuilder is defined(?). Antony Lee import io, numpy as np s = io.BytesIO() s.write(babc 1\ndef 2) s.seek(0) t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes}) print(t, t.dtype) # - [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', 'i8')] s.seek(0) t = np.genfromtxt(s, dtype=None, converters={0: lambda s: s.decode(utf-8)}) print(t, t.dtype) # - [('', 1) ('', 2)] [('f0', 'U0'), ('f1', 'i8')] Could you open a ticket for this? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Python3, genfromtxt and unicode
With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to the largest number of chars (npyio.py line 1596), but it doesn't do the same for unicode fields, which is a pity. See example below. I tried to change npyio.py around line 1600 to add that but it didn't work; from my limited understanding the problem comes earlier, in the way StringBuilder is defined(?). Antony Lee import io, numpy as np s = io.BytesIO() s.write(babc 1\ndef 2) s.seek(0) t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes}) print(t, t.dtype) # - [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', 'i8')] s.seek(0) t = np.genfromtxt(s, dtype=None, converters={0: lambda s: s.decode(utf-8)}) print(t, t.dtype) # - [('', 1) ('', 2)] [('f0', 'U0'), ('f1', 'i8')] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] unicode string for specifying dtype
I just ran into the following: np.dtype(uf4) Traceback (most recent call last): File stdin, line 1, in module TypeError: data type not understood Is that the expected behaviour? Thanks in advance, Antony Lee ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion