Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-08 Thread Eelco Hoogendoorn
I personally use Octave and/or Numpy for several years now and never ever needed braodcasting. But since it is still there there will be many users who need it, there will be some use for it. Uhm, yeah, there is some use for it. Im all for explicit over implicit, but personally current

Re: [Numpy-discussion] Optimizing numpy's einsum expression (again)

2015-01-16 Thread Eelco Hoogendoorn
Thanks for taking the time to think about this; good work. Personally, I don't think a factor 5 memory overhead is much to sweat over. The most complex einsum I have ever needed in a production environment was 5/6 terms, and for what this anecdote is worth, speed was a far bigger concern to me

Re: [Numpy-discussion] Sorting refactor

2015-01-16 Thread Eelco Hoogendoorn
I don't know if there is a general consensus or guideline on these matters, but I am personally not entirely charmed by the use of behind-the-scenes parallelism, unless explicitly requested. Perhaps an algorithm can be made faster, but often these multicore algorithms are also less efficient, and

Re: [Numpy-discussion] Sorting refactor

2015-01-16 Thread Eelco Hoogendoorn
To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: CAJhcF=1O5Own_5ydzu+To8HHbm3e66k= iunqreiasdy23dn...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 On 16 January 2015 at 13:15, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps an algorithm can

Re: [Numpy-discussion] Question about dtype

2014-12-13 Thread Eelco Hoogendoorn
This is a general problem in trying to use JSON to send arbitrary python objects. Its not made for that purpose, JSON itself only supports a very limited grammar (only one sequence type for instance, as you noticed), so in general you will need to specify your own encoding/decoding for more

Re: [Numpy-discussion] Should ndarray be a context manager?

2014-12-09 Thread Eelco Hoogendoorn
My impression is that this level of optimization does and should not fall within the scope of numpy.. -Original Message- From: Sturla Molden sturla.mol...@gmail.com Sent: ‎9-‎12-‎2014 16:02 To: numpy-discussion@scipy.org numpy-discussion@scipy.org Subject: [Numpy-discussion] Should

Re: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions)

2014-10-29 Thread Eelco Hoogendoorn
: Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps the 'batteries included' philosophy made sense in the early days of numpy; but given that there are several fft libraries with their own pros and cons, and that most numpy projects will use none of them at all, why should

Re: [Numpy-discussion] help using np.einsum for stacked matrix multiplication

2014-10-29 Thread Eelco Hoogendoorn
You need to specify your input format. Also, if your output matrix misses the NY dimension, that implies you wish to contract (sum) over it, which contradicts your statement that the 2x2 subblocks form the matrices to multiply with. In general, I think it would help if you give a little more

Re: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions)

2014-10-28 Thread Eelco Hoogendoorn
If I may 'hyjack' the discussion back to the meta-point: should we be having this discussion on the numpy mailing list at all? Perhaps the 'batteries included' philosophy made sense in the early days of numpy; but given that there are several fft libraries with their own pros and cons, and that

Re: [Numpy-discussion] Choosing between NumPy and SciPy functions

2014-10-27 Thread Eelco Hoogendoorn
The same occurred to me when reading that question. My personal opinion is that such functionality should be deprecated from numpy. I don't know who said this, but it really stuck with me: but the power of numpy is first and foremost in it being a fantastic interface, not in being a library.

Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

2014-10-26 Thread Eelco Hoogendoorn
Im not sure why the memory doubling is necessary. Isnt it possible to preallocate the arrays and write to them? I suppose this might be inefficient though, in case you end up reading only a small subset of rows out of a mostly corrupt file? But that seems to be a rather uncommon corner case.

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Eelco Hoogendoorn
Thanks Warren, I think these are sensible additions. I would argue to treat the None-False condition as an error. Indeed I agree one might argue the correcr behavior is to 'shuffle' the singleton block of data, which does nothing; but its more likely to come up as an unintended error than as a

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread Eelco Hoogendoorn
yeah, a shuffle function that does not shuffle indeed seems like a major source of bugs to me. Indeed one could argue that setting axis=None should suffice to give a clear enough declaration of intent; though I wouldn't mind typing the extra bit to ensure consistent semantics. On Sun, Oct 12,

Re: [Numpy-discussion] 0/0 == 0?

2014-10-03 Thread Eelco Hoogendoorn
slightly OT; but fwiw, its all ill-thought out nonsense from the start anyway. ALL numbers satisfy the predicate 0*x=0. what the IEEE calls 'not a number' would be more accurately called 'not a specific number', or 'a number'. whats a logical negation among computer scientists? On Fri, Oct 3,

Re: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names

2014-10-01 Thread Eelco Hoogendoorn
the kind of duck it is, so to speak. Indeed it seems like an atypical design pattern; but I don't see a problem with it. On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck jzwi...@gmail.com wrote: On 1 Oct 2014 04:30, Stephan Hoyer sho...@gmail.com wrote: On Tue, Sep 30, 2014 at 1:22 PM, Eelco

Re: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names

2014-09-30 Thread Eelco Hoogendoorn
Sounds fair to me. Indeed the ducktyping argument makes sense, and I have a hard time imagining any namespace conflicts or other confusion. Should this attribute return none for non-structured arrays, or simply be undefined? On Tue, Sep 30, 2014 at 12:49 PM, John Zwinck jzwi...@gmail.com wrote:

Re: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names

2014-09-30 Thread Eelco Hoogendoorn
So a non-structured array should return an empty list/iterable as its keys? That doesn't seem right to me, but perhaps you have a compelling example to the contrary. I mean, wouldn't we want the duck-typing to fail if it isn't a structured array? Throwing an attributeError seems like the best

Re: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names

2014-09-30 Thread Eelco Hoogendoorn
On more careful reading of your words, I think we agree; indeed, if keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays. On Tue, Sep 30, 2014 at 10:21 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: So a non-structured array

Re: [Numpy-discussion] Tracking and inspecting numpy objects

2014-09-15 Thread Eelco Hoogendoorn
On Mon, Sep 15, 2014 at 11:55 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mo, 2014-09-15 at 10:16 +0200, Mads Ipsen wrote: Hi, I am trying to inspect the reference count of numpy arrays generated by my application. Initially, I thought I could inspect the tracked objects

Re: [Numpy-discussion] Tracking and inspecting numpy objects

2014-09-15 Thread Eelco Hoogendoorn
for taking time to answer! Best regards, Mads On 15/09/14 12:11, Sebastian Berg wrote: On Mo, 2014-09-15 at 12:05 +0200, Eelco Hoogendoorn wrote: On Mon, Sep 15, 2014 at 11:55 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mo, 2014-09-15 at 10:16 +0200, Mads Ipsen

Re: [Numpy-discussion] why does u.resize return None?

2014-09-11 Thread Eelco Hoogendoorn
agreed; I never saw the logic in returning none either. On Thu, Sep 11, 2014 at 4:27 PM, Neal Becker ndbeck...@gmail.com wrote: It would be useful if u.resize returned the new array, so it could be used for chaining operations -- -- Those who don't understand recursion are doomed to repeat

Re: [Numpy-discussion] Generalize hstack/vstack -- stack; Block matrices like in matlab

2014-09-08 Thread Eelco Hoogendoorn
Sturla: im not sure if the intention is always unambiguous, for such more flexible arrangements. Also, I doubt such situations arise often in practice; if the arrays arnt a grid, they are probably a nested grid, and the code would most naturally concatenate them with nested calls to a stacking

Re: [Numpy-discussion] Generalize hstack/vstack -- stack; Blockmatrices like in matlab

2014-09-08 Thread Eelco Hoogendoorn
the world... I think just having this generalize stack feature would be nice start. Tetris could be built on top of that later. (Although, I do vote for at least 3 or 4 dimensional stacking, if possible). Cheers! Ben Root On Mon, Sep 8, 2014 at 12:41 PM, Eelco Hoogendoorn hoogendoorn.ee

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-04 Thread Eelco Hoogendoorn
On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Not sure about the hashing

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-04 Thread Eelco Hoogendoorn
On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Wed, Sep 3, 2014 at 6:41 AM

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-04 Thread Eelco Hoogendoorn
that big a concern in the first place. On Thu, Sep 4, 2014 at 7:55 PM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Thu, Sep 4, 2014 at 10:39 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-04 Thread Eelco Hoogendoorn
On Thu, Sep 4, 2014 at 8:14 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: I should clarify: I am speaking about my implementation, I havnt looked at the numpy implementation for a while so im not sure what it is up to. Note that by 'almost free', we are still talking about three

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-04 Thread Eelco Hoogendoorn
= Series(a) # without the creation overhead In [12]: %timeit s.unique() 1 loops, best of 3: 75.3 µs per loop On Thu, Sep 4, 2014 at 2:29 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: On Thu, Sep 4, 2014 at 8:14 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-03 Thread Eelco Hoogendoorn
On Wed, Sep 3, 2014 at 4:07 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Tue, Sep 2, 2014 at 5:40 PM, Charles R Harris charlesr.har...@gmail.com wrote: What do you think about the suggestion of timsort? One would need to concatenate the arrays before sorting, but it should

Re: [Numpy-discussion] Give Jaime Fernandez commit rights.

2014-09-03 Thread Eelco Hoogendoorn
+1; though I am relatively new to the scene, Jaime's contributions have always stood out to me as thoughtful. On Thu, Sep 4, 2014 at 12:42 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Wed, Sep 3, 2014 at 11:48 PM, Robert Kern robert.k...@gmail.com wrote: On Wed, Sep 3, 2014 at 10:47

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-01 Thread Eelco Hoogendoorn
, a significant fraction of all stackoverflow numpy questions are (unknowingly) exactly about 'how to do grouping in numpy'. On Mon, Sep 1, 2014 at 4:36 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Aug 31, 2014 at 1:48 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-09-01 Thread Eelco Hoogendoorn
On Mon, Sep 1, 2014 at 2:05 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Sure, id like to do the hashing things out, but I would also like some preliminary feedback as to whether this is going

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-31 Thread Eelco Hoogendoorn
on. You mentioned getting the numpy core developers involved; are they not subscribed to this mailing list? I wouldn't be surprised; youd hope there is a channel of discussion concerning development with higher signal to noise On Thu, Aug 28, 2014 at 1:49 AM, Eelco Hoogendoorn hoogendoorn.ee

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
It wouldn't hurt to have this function, but my intuition is that its use will be minimal. If you are already working with sorted arrays, you already have a flop cost on that order of magnitude, and the optimized merge saves you a factor two at the very most. Using numpy means you are sacrificing

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
of the numpy devs would be helpful in getting this somewhere. Jaime On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: It wouldn't hurt to have this function, but my intuition is that its use will be minimal. If you are already working with sorted arrays, you

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
in zip(*group_by(keys)(values)): print k, g.mean(0) On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: f.i., this works as expected as well (100 keys of 1d int arrays and 100 values of 1d float arrays): group_by(randint(0,4,(100,2))).mean(rand(100,2

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Eelco Hoogendoorn
true vectorization on those operations. The way I see it, numpy may not have to have a GroupBy implementation, but it should at least enable implementing one that is fast and efficient over any axis. On Wed, Aug 27, 2014 at 12:38 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote

Re: [Numpy-discussion] np.unique with structured arrays

2014-08-22 Thread Eelco Hoogendoorn
It does not sound like an issue with unique, but rather like a matter of floating point equality and representation. Do the ' identical' elements pass an equality test? -Original Message- From: Nicolas P. Rougier nicolas.roug...@inria.fr Sent: ‎22-‎8-‎2014 15:21 To: Discussion of

Re: [Numpy-discussion] np.unique with structured arrays

2014-08-22 Thread Eelco Hoogendoorn
Oh yeah this could be. Floating point equality and bitwise equality are not the same thing. -Original Message- From: Jaime Fernández del Río jaime.f...@gmail.com Sent: ‎22-‎8-‎2014 16:22 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion]

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Eelco Hoogendoorn
Agreed; this addition occurred to me as well. Note that the implemenatation should be straightforward: just allocate an enlarged array, use some striding logic to construct the relevant view, and let einsums internals act on the view. hopefully, you wont even have to touch the guts of einsum at

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Eelco Hoogendoorn
indexing once... ill see if I can dig that up. On Fri, Aug 15, 2014 at 5:01 PM, Sebastian Berg sebast...@sipsolutions.net wrote: On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote: Agreed; this addition occurred to me as well. Note that the implemenatation should be straightforward

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Eelco Hoogendoorn
here is a snippet I extracted from a project with similar aims (integrating the functionality of einsum and numexpr, actually) Not much to it, but in case someone needs a reminder on how to use striding tricks: http://pastebin.com/kQNySjcj On Fri, Aug 15, 2014 at 5:20 PM, Eelco Hoogendoorn

Re: [Numpy-discussion] New function `count_unique` to generate contingency tables.

2014-08-13 Thread Eelco Hoogendoorn
be fine with me. Warren On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: ah yes, that's also an issue I was trying to deal with. the semantics I prefer

Re: [Numpy-discussion] New function `count_unique` to generate contingency tables.

2014-08-12 Thread Eelco Hoogendoorn
Thanks. Prompted by that stackoverflow question, and similar problems I had to deal with myself, I started working on a much more general extension to numpy's functionality in this space. Like you noted, things get a little panda-y, but I think there is a lot of panda's functionality that could or

Re: [Numpy-discussion] New function `count_unique` to generate contingency tables.

2014-08-12 Thread Eelco Hoogendoorn
. I also agree that the extension you propose here is useful; but ideally, with a little more discussion on these subjects we can converge on an even more comprehensive overhaul On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington joferking...@gmail.com wrote: On Tue, Aug 12, 2014 at 11:17 AM, Eelco

Re: [Numpy-discussion] Calculation of a hessian

2014-08-08 Thread Eelco Hoogendoorn
Do it in pure numpy? How about copying the source of numdifftools? What exactly is the obstacle to using numdifftools? There seem to be no licensing issues. In my experience, its a crafty piece of work; and calculating a hessian correctly, accounting for all kinds of nasty floating point issues,

Re: [Numpy-discussion] Preliminary thoughts on implementing __matmul__

2014-08-07 Thread Eelco Hoogendoorn
I don't expect stacked matrices/vectors to be used often, although there are some areas that might make heavy use of them, so I think we could live with the simple implementation, it's just a bit of a wart when there is broadcasting of arrays. Just to be clear, the '@' broadcasting differs from

Re: [Numpy-discussion] Array2 subset of array1

2014-08-05 Thread Eelco Hoogendoorn
np.all(np.in1d(array1,array2)) On Tue, Aug 5, 2014 at 2:58 PM, Jurgens de Bruin debrui...@gmail.com wrote: Hi, I am new to numpy so any help would be greatly appreciated. I have two arrays: array1 = np.arange(1,100+1) array2 = np.arange(1,50+1) How can I calculate/determine if

Re: [Numpy-discussion] Array2 subset of array1

2014-08-05 Thread Eelco Hoogendoorn
ah yes, that may indeed be what you want. depending on your datatype, you could access the underlying raw data as a string. b.tostring() in a.tostring() sort of works; but isn't entirely safe, as you may have false positive matches which arnt aligned to your datatype using str.find in combination

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the current master? If so, I think that would be very much regrettable; and if this is a minority opinion, I do hope that at least this gets documented

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
Sebastian: Those are good points. Indeed iteration order may already produce different results, even though the semantics of numpy suggest identical operations. Still, I feel this different behavior without any semantical clues is something to be minimized. Indeed copying might have large speed

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
+0200, Eelco Hoogendoorn wrote: Sebastian: Those are good points. Indeed iteration order may already produce different results, even though the semantics of numpy suggest identical operations. Still, I feel this different behavior without any semantical clues is something

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
- From: Julian Taylor jtaylor.deb...@googlemail.com Sent: ‎26-‎7-‎2014 00:58 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 25.07.2014 23:51, Eelco Hoogendoorn wrote: Ray: I'm not working with Hubble

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
I was wondering the same thing. Are there any known tradeoffs to this method of reduction? On Sat, Jul 26, 2014 at 12:39 PM, Sturla Molden sturla.mol...@gmail.com wrote: Sebastian Berg sebast...@sipsolutions.net wrote: chose more stable algorithms for such statistical functions. The

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
of the benefits, without any of the drawbacks? On Sat, Jul 26, 2014 at 3:53 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 26.07.2014 15:38, Eelco Hoogendoorn wrote: Why is it not always used? for 1d reduction the iterator blocks by 8192 elements even when no buffering is required

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
Perhaps I in turn am missing something; but I would suppose that any algorithm that requires multiple passes over the data is off the table? Perhaps I am being a little old fashioned and performance oriented here, but to make the ultra-majority of use cases suffer a factor two performance penalty

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
that does not support the specified precision, rather than obtain subtly or horribly broken results without warning when moving your code to a different platform/compiler whatever. On Fri, Jul 25, 2014 at 5:37 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps it is a slightly semantical

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
Arguably, the whole of floating point numbers and their related shenanigans is not very pythonic in the first place. The accuracy of the output WILL depend on the input, to some degree or another. At the risk of repeating myself: explicit is better than implicit -Original Message-

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
framework. The ability to specify different algorithms per kwarg wouldn't be a bad idea either, imo; or the ability to explicitly specify a separate output and accumulator dtype. On Fri, Jul 25, 2014 at 8:00 PM, Alan G Isaac alan.is...@gmail.com wrote: On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
Ray: I'm not working with Hubble data, but yeah these are all issues I've run into with my terrabytes of microscopy data as well. Given that such raw data comes as uint16, its best to do your calculations as much as possible in good old ints. What you compute is what you get, no obscure

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Eelco Hoogendoorn
Arguably, this isn't a problem of numpy, but of programmers being trained to think of floating point numbers as 'real' numbers, rather than just a finite number of states with a funny distribution over the number line. np.mean isn't broken; your understanding of floating point number is. What you

Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

2014-07-24 Thread Eelco Hoogendoorn
:09 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for large float32arrays On 7/24/2014 5:59 AM, Eelco Hoogendoorn wrote to Thomas: np.mean isn't broken; your understanding of floating point number is. This comment seems

Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

2014-07-24 Thread Eelco Hoogendoorn
Inaccurate and utterly wrong are subjective. If You want To Be sufficiently strict, floating point calculations are almost always 'utterly wrong'. Granted, It would Be Nice if the docs specified the algorithm used. But numpy does not produce anything different than what a standard c loop or

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-24 Thread Eelco Hoogendoorn
alan.is...@gmail.com Sent: ‎25-‎7-‎2014 00:10 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: This isn't a bug report, but rather a feature request. I'm

Re: [Numpy-discussion] Find n closest values

2014-06-22 Thread Eelco Hoogendoorn
... Nicolas On 22 Jun 2014, at 10:30, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps you could simplify some statements, but at least the algorithmic complexity is fine, and everything is vectorized, so I doubt you will get huge gains. You could take a look

Re: [Numpy-discussion] Find n closest values

2014-06-22 Thread Eelco Hoogendoorn
Also, if you use scipy.spatial.KDTree, make sure to use cKDTree; the native python kdtree is sure to be slow as hell. On Sun, Jun 22, 2014 at 7:05 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Well, if the spacing is truly uniform, then of course you don't really need the search

Re: [Numpy-discussion] Find n closest values

2014-06-22 Thread Eelco Hoogendoorn
...@inria.fr wrote: Thanks, I'll try your solution. Data (L) is not so big actually, it represents pixels on screen and (I) represents line position (for grids). I need to compute this quantity everytime the user zoom in or out. Nicolas On 22 Jun 2014, at 19:05, Eelco Hoogendoorn

Re: [Numpy-discussion] Find n closest values

2014-06-22 Thread Eelco Hoogendoorn
but the way you wrote it might open the door for other improvements. Thanks. Nicolas On 22 Jun 2014, at 21:14, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Protip: if you are writing your own rasterization code in python, be prepared to forget about performance altogether

Re: [Numpy-discussion] Easter Egg or what I am missing here?

2014-05-21 Thread Eelco Hoogendoorn
I agree; this 'wart' has also messed with my code a few times. I didn't find it to be the case two years ago, but perhaps I should reevaluate if the scientific python stack has sufficiently migrated to python 3. On Thu, May 22, 2014 at 7:35 AM, Siegfried Gonzi siegfried.go...@ed.ac.ukwrote: On

Re: [Numpy-discussion] repeat an array without allocation

2014-05-05 Thread Eelco Hoogendoorn
If b is indeed big I don't see a problem with the python loop, elegance aside; but Cython will not beat it on that front. On Mon, May 5, 2014 at 9:34 AM, srean srean.l...@gmail.com wrote: Great ! thanks. I should have seen that. Is there any way array multiplication (as opposed to matrix

Re: [Numpy-discussion] repeat an array without allocation

2014-05-04 Thread Eelco Hoogendoorn
nope; its impossible to express A as a strided view on x, for the repeats you have. even if you had uniform repeats, it still would not work. that would make it easy to add an extra axis to x without a new allocation; but reshaping/merging that axis with axis=0 would again trigger a copy, as it

Re: [Numpy-discussion] arrays and : behaviour

2014-05-01 Thread Eelco Hoogendoorn
You problem isn't with colon indexing, but with the interpretation of the arguments to plot. multiple calls to plot with scalar arguments do not have the same result as a single call with array arguments. For this to work as intended, you would need plt.hold(True), for starters, and maybe there

Re: [Numpy-discussion] numerical gradient, Jacobian, and Hessian

2014-04-21 Thread Eelco Hoogendoorn
I was going to suggest numdifftools; its a very capable package in my experience. Indeed it would be nice to have it integrated into scipy. Also, in case trying to calculate a numerical gradient is a case of 'the math getting too bothersome' rather than no closed form gradient actually existing:

Re: [Numpy-discussion] string replace

2014-04-21 Thread Eelco Hoogendoorn
Indeed this isn't numpy, and I don't see how your collegues opinions have bearing on that issue; but anyway.. There isn't a 'python' way to do this, the best method involves some form of parsing library. Undoubtly there is a one-line regex to do this kind of thing, but regexes are themselves the

Re: [Numpy-discussion] min depth to nonzero in 3d array

2014-04-17 Thread Eelco Hoogendoorn
I agree; argmax would the best option here; though I would hardly call it abuse. It seems perfectly readable and idiomatic to me. Though the != comparison requires an extra pass over the array, that's the kind of tradeoff you make in using numpy. On Thu, Apr 17, 2014 at 7:45 PM, Stephan Hoyer

Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-12 Thread Eelco Hoogendoorn
I wonder: how hard would it be to create a more 21th-century oriented BLAS, relying more on code generation tools, and perhaps LLVM/JITting? Wouldn't we get ten times the portability with one-tenth the lines of code? Or is there too much dark magic going on in BLAS for such an approach to come

Re: [Numpy-discussion] Wiki page for building numerical stuff onWindows

2014-04-12 Thread Eelco Hoogendoorn
] Wiki page for building numerical stuff onWindows Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: I wonder: how hard would it be to create a more 21th-century oriented BLAS, relying more on code generation tools, and perhaps LLVM/JITting? Wouldn't we get ten times the portability

Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Eelco Hoogendoorn
I agree; breaking code over this would be ridiculous. Also, I prefer the zero default, despite the mean/std combo probably being more common. On Tue, Apr 1, 2014 at 10:02 PM, Sturla Molden sturla.mol...@gmail.comwrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I

Re: [Numpy-discussion] Is there a pure numpy recipe for this?

2014-03-27 Thread Eelco Hoogendoorn
Id recommend taking a look at pytables as well. It has support for out-of-core array computations on large arrays. On Thu, Mar 27, 2014 at 9:00 PM, RayS r...@blue-cove.com wrote: Thanks for all of the suggestions; we are migrating to 64bit Python soon as well. The environments are Win7 and

Re: [Numpy-discussion] Is there a pure numpy recipe for this?

2014-03-26 Thread Eelco Hoogendoorn
Without looking ahead, here is what I came up with; but I see more elegant solutions have been found already. import numpy as np def as_dense(f, length): i = np.zeros(length+1, np.int) i[f[0]] = 1 i[f[1]] = -1 return np.cumsum(i)[:-1] def as_sparse(d): diff =

Re: [Numpy-discussion] Implementing elementary matrices

2014-03-24 Thread Eelco Hoogendoorn
Sounds (marginally) useful; although elementary row/column operations are in practice usually better implemented directly by indexing rather than in an operator form. Though I can see a use for the latter. My suggestion: its not a common enough operation to deserve a 4 letter acronym (assuming

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-18 Thread Eelco Hoogendoorn
there's no way to distinguish between a 2d field of matrices and a 3d field of vectors. I guess this is a repeat of part of what Eelco Hoogendoorn saying a few posts back I was just wondering if anyone sees a place, to get @ a little closer to Einsum, for some sort of array class

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-18 Thread Eelco Hoogendoorn
!?) My two cents, Sebastian Haase On Tue, Mar 18, 2014 at 7:13 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps this a bit of a thread hyjack; but this discussion got me thinking about how to arrive at a more vectorized/tensorified way of specifying linear algebra

Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-16 Thread Eelco Hoogendoorn
Note that I am not opposed to extra operators in python, and only mildly opposed to a matrix multiplication operator in numpy; but let me lay out the case against, for your consideration. First of all, the use of matrix semantics relative to arrays semantics is extremely rare; even in linear

Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-16 Thread Eelco Hoogendoorn
Different people work on different code and have different experiences here -- yours may or may be typical yours. Pauli did some quick checks on scikit-learn nipy scipy, and found that in their test suites, uses of np.dot and uses of elementwise-multiplication are ~equally common:

Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-16 Thread Eelco Hoogendoorn
. Ideally, the standard operator would pick a sensible default which can be inferred from the arguments, while allowing for explicit specification of the kind of algorithm used where this verbosity is worth the hassle. On Sun, Mar 16, 2014 at 5:33 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com

[Numpy-discussion] Pickling of memory aliasing patterns

2014-03-13 Thread Eelco Hoogendoorn
I have been working on a general function caching mechanism, and in doing so I stumbled upon the following quirck: @cached def foo(a,b): b[0] = 1 return a[0] a = np.zeros(1) b = a[:] print foo(a, b)#computes and returns 1 print foo(a, b)#gets 1

Re: [Numpy-discussion] dtype promotion

2014-03-03 Thread Eelco Hoogendoorn
The tuple gets cast to an ndarray; which invokes a different codepath than the scalar addition. Somehow, numpy has gotten more aggressive at upcasting to float64 as of 1.8, but I havnt been able to discover the logic behind it either. On Mon, Mar 3, 2014 at 10:06 PM, Nicolas Rougier

Re: [Numpy-discussion] ANN: XDress v0.4

2014-02-27 Thread Eelco Hoogendoorn
I have; but if I recall correctly, it does not solve the problem of distributing code that uses it, or does it? On Thu, Feb 27, 2014 at 10:51 AM, Toby St Clere Smithe pyvienn...@tsmithe.net wrote: Hi, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com writes: Thanks for the heads up, I wasn't

Re: [Numpy-discussion] ANN: XDress v0.4

2014-02-27 Thread Eelco Hoogendoorn
, 2014 at 1:51 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Thanks for the heads up, I wasn't aware of this project. While boost.python is a very nice package, its distributability is nothing short of nonexistent, so its great to have a pure python binding generator. Thanks! One

Re: [Numpy-discussion] ANN: XDress v0.4

2014-02-27 Thread Eelco Hoogendoorn
That is good to know. The boost documentation makes it appear as if bjam is the only way to build boost.python, but good to see examples to the contrary! On Thu, Feb 27, 2014 at 2:19 PM, Toby St Clere Smithe pyvienn...@tsmithe.net wrote: Eelco Hoogendoorn hoogendoorn.ee...@gmail.com writes

Re: [Numpy-discussion] ANN: XDress v0.4

2014-02-26 Thread Eelco Hoogendoorn
Thanks for the heads up, I wasn't aware of this project. While boost.python is a very nice package, its distributability is nothing short of nonexistent, so its great to have a pure python binding generator. One thing which I have often found frustrating is natural ndarray interop between python

Re: [Numpy-discussion] Help Understanding Indexing Behavior

2014-02-25 Thread Eelco Hoogendoorn
To elaborate on what Julian wrote: it is indeed simply a convention; slices/ranges in python are from the start to one-past-the-end. The reason for the emergence of this convention is that C code using iterators looks most natural this way. This manifests in a simple for (i = 0; i 5; i++), but

Re: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function

2014-02-20 Thread Eelco Hoogendoorn
If the standard semantics are not affected, and the most common two-argument scenario does not take more than a single if-statement overhead, I don't see why it couldn't be a replacement for the existing np.dot; but others mileage may vary. On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte

Re: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function

2014-02-20 Thread Eelco Hoogendoorn
of np.einsum will be hard to beat On Thu, Feb 20, 2014 at 3:27 PM, Eric Moore e...@redtetrahedron.org wrote: On Thursday, February 20, 2014, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: If the standard semantics are not affected, and the most common two-argument scenario does not take

Re: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function

2014-02-17 Thread Eelco Hoogendoorn
considering np.dot takes only its binary positional args and a single defaulted kwarg, passing in a variable number of positional args as a list makes sense. Then just call the builtin reduce on the list, and there you go. I also generally approve of such semantics for binary associative

Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH

2014-02-16 Thread Eelco Hoogendoorn
hi david, I havnt run the code; but the _replace_nan(0) call worries me; especially considering that the unit tests seem to deal with positive numbers exclusively. Have you tested with mixed positive/negative inputs? On Sun, Feb 16, 2014 at 6:13 PM, David Freese dfre...@stanford.edu wrote:

Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Eelco Hoogendoorn
My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From:

Re: [Numpy-discussion] libflatarray

2014-02-13 Thread Eelco Hoogendoorn
As usual, 'it depends', but a struct of arrays layout (which is a virtual necessity on GPU's), can also be advantageous on the CPU. One rarely acts on only a single object at a time; but quite often, you only work on a subset of the objects attributes at a time. In an array of structs layout, you

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-11 Thread Eelco Hoogendoorn
My 2pc: I personally hardly ever use matrix, even in linear algebra dense code. It can be nice though, to use matrix semantics within a restricted scope. When I first came to numpy, the ability to choose linear algebra versus array semantics seemed like a really neat thing to me; though in

  1   2   >