from:"Eric Firing"

Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread Eric Firing


On 2016/07/06 8:25 AM, Benjamin Root wrote:

I wouldn't have the keyword be "where", as that collides with the notion
of "where" elsewhere in numpy.


Agreed.  Maybe "side"?

(I find atleast_1d and atleast_2d to be very helpful for handling 
inputs, as Ben noted; I'm skeptical as to the value of atleast_3d and 
atleast_nd.)


Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Question about nump.ma.polyfit

2015-12-15 Thread Eric Firing


On 2015/12/14 6:39 PM, Samuel Dupree wrote:

I'm running Python 2.7.11 from the Anaconda distribution (version 2.4.1)
on a MacBook Pro running Mac OS X version 10.11.2 (El Capitan)

I'm attempting to use numpy.ma.polyfit to perform a linear least square
fit on some data I have. I'm running NumPy version 1.10.1. I've observed
that in executing either numpy.polyfit or numpy.ma.polyfit I get the
following traceback:

/Users/user/anaconda/lib/python2.7/site-packages/numpy/lib/polynomial.py:594:
RankWarning: Polyfit may be poorly conditioned
   warnings.warn(msg, RankWarning)
Traceback (most recent call last):
   File "ComputeEnergy.py", line 132, in 
 coeffs, covar = np.ma.polyfit( xfit, yfit, fit_degree,
rcond=rcondv, cov=True )
   File
"/Users/user/anaconda/lib/python2.7/site-packages/numpy/ma/extras.py",
line 1951, in polyfit
 return np.polyfit(x, y, deg, rcond, full, w, cov)
   File
"/Users/user/anaconda/lib/python2.7/site-packages/numpy/lib/polynomial.py",
line 607, in polyfit
 return c, Vbase * fac
ValueError: operands could not be broadcast together with shapes (6,6) (0,)


I've attached a stripped down version of the Python program I'm running.


Sam,

That is not stripped down very far; it's still not something someone on 
the list can run.




Any suggestions?


Use debugging techniques to figure out what is going on inside your 
script.  In particular, what are the arguments that polyfit is choking 
on?  I would run the script in ipython and use the %debug magic to drop 
into the debugger when it fails.  Then use "up" to move up the stack 
until you get to the line calling polyfit, and then use the print 
function to print each of the arguments.  Chances are, either they will 
not be what you expect them to be, or they will, but you will find a 
logical inconsistency among them.  It looks like you are using Spyder, 
presumably with the ipython console, so run your script, then when it 
fails type "%debug" in the ipython console window and you will be 
dropped into the standard pdb debugger.


Eric



Sam Dupree.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Recognizing a cycle in a vector

2015-12-03 Thread Eric Firing


On 2015/12/02 10:45 PM, Manolo Martínez wrote:

1) this func sorts the absolute value of the amplitudes to find the two
most important  components, and this seems overkill for large vectors.


Try

inds = np.argpartition(-np.abs(ft), 2)[:2]

Now inds holds the indices of the two largest components.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] When to stop supporting Python 2.6?

2015-12-03 Thread Eric Firing


On 2015/12/03 12:47 PM, Charles R Harris wrote:

Hi All,

Thought I would raise the topic apropos this post

. There is not a great advantage to dropping 2.6, OTOH, 2.7 has more
features (memoryview) and we could clean up the code a bit.

Along the same lines, dropping support for Python 3.2 would allow more
cleanup. In fact, I'd like to get to 3.4 as soon as possible, but don't
know what would be a reasonable schedule. The Python 3 series might be
easier to move forward on, as I think that Python 3 is just now starting
to become the dominant version in some areas.

Chuck



Chuck,

I would support dropping the old versions now.  As a related data point, 
matplotlib is testing master on 2.7, 3.4, and 3.5--no more 2.6 and 3.3.


Eric

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] future of f2py and Fortran90+

2015-12-03 Thread Eric Firing


On 2015/12/03 11:08 AM, Yuxiang Wang wrote:

Too add to Sturla - I think this is what he mentioned but in more details:

http://www.fortran90.org/src/best-practices.html#interfacing-with-python


Right, but for each function that requires writing two wrappers, one in 
Fortran and a second one in cython.  Even though they are very simple, 
this would be cumbersome for a library with more than a few functions. 
Therefore I think there is still a place for f2py and f90wrap, and I am 
happy to see development continuing at least on the latter.


Eric



Shawn

On Tue, Jul 14, 2015 at 9:45 PM, Sturla Molden <sturla.mol...@gmail.com> wrote:

Eric Firing <efir...@hawaii.edu> wrote:


I'm curious: has anyone been looking into what it would take to enable
f2py to handle modern Fortran in general?  And into prospects for
getting such an effort funded?


No need. Use Cython and Fortran 2003 ISO C bindings. That is the only
portable way to interop between Fortran and C (including CPython) anyway.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Feedback on new argument positions for ma.dot and MaskedArray.dot

2015-11-08 Thread Eric Firing


On 2015/11/08 3:46 PM, Charles R Harris wrote:

Hi All,

I'd like some feedback for the position of the `strict` and `out`
arguments for masked arrays. See gh-6653
 for the PR in question.

Current status without #6652

 1. ma.dot(a, b, strict=False) -- established
 2. a.dot(b, out=None) -- new in 1.10


Note that 1. requires adding `out` to the end for backward
compatibility. OTOH, 2. is new(ish). We can either keep it compatible
with ndarray.dot and add `strict` to the end and have it incompatible
with 1., or, slightly changing it in 1.10.2, make it compatible with
with 1. but incompatible with ndarray. We will face the same sort of
problem with adding newer ndarray arguments other existing ma functions
that have their own specialized arguments, so having a policy up front
will be helpful. My own inclination here is to keep 1. and 2.
compatible, and then perhaps at some point following a future warning,
make both `strict` and `out` keyword arguments only. Another possiblitly
is to make that transition immediate for the method.


I'm not sure about the best sequence, but I like the strategy of moving 
to keyword-only arguments.  It is good for readability, and for flexibility.


I also prefer that there be a single convention: either the "out" kwarg 
is the end of the every signature, or it is the first kwarg in every 
signature.  It's a very special and unusual kwarg, so it should have a 
standard location.


Eric



Thoughts?

Chuck


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Problem while writing array with np.savetxt

2015-09-24 Thread Eric Firing


On 2015/09/23 9:17 PM, Andrew Nelson wrote:

Dear list,
whilst trying to write an array to disk I am coming across the
following.  What am I doing wrong?  Surely f is a file handle?
(python 3.4.3, numpy 1.9.2)

import numpy as np
a = np.arange(10.)
with open('test.dat', 'w') as f:
 np.savetxt(f, a)



It will work if you open with 'wb'.  Yes, this seems like a bug; or at 
least, the anomaly should be noted in the docstring.


Eric


---

TypeError  Traceback (most recent call last)
  in() 2 a = np.arange(10.)3 with 
open('test.dat', 'w') as f:>
4np.savetxt(f,
a)/Users/anz/Documents/Andy/programming/dev3/lib/python3.4/site-packages/numpy/lib/npyio.py
in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments)
1085 else:1086 for row in X:-> 1087fh.write(asbytes(format % tuple(row)
+ newline))1088 if len(footer) > 0:1089 footer = footer.replace('\n',
'\n' + comments)TypeError: must be str, not bytes




--
_
Dr. Andrew Nelson


_


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-09-04 Thread Eric Firing

On 2015/09/04 10:53 AM, Matthew Brett wrote:
> On Fri, Sep 4, 2015 at 2:33 AM, Matthew Brett  wrote:
>> Hi,
>>
>> On Wed, Sep 2, 2015 at 5:41 PM, Chris Barker  wrote:
>>> 1) I very much agree that governance can make or break a project. However,
>>> the actual governance approach often ends up making less difference than the
>>> people involved.
>>>
>>> 2) While the FreeBSD and XFree examples do point to some real problems with
>>> the "core" model it seems that there are many other projects that are using
>>> it quite successfully.
>
> I was just rereading the complaints about the 'core' structure from
> high-level NetBSD project leaders:
>
> "[the "core" and "board of directors"] teams are dysfunctional because
> they do not provide leadership: all they do is act reactively to
> requests from users and/or to resolve internal disputes. In other
> words: there is no initiative nor vision emerging from these teams
> (and, for that matter, from anybody)." [1]
>
> "There is no high-level direction; if you ask "what about the problems
> with threads" or "will there be a flash-friendly file system", the
> best you'll get is "we'd love to have both" -- but no work is done to
> recruit people to code these things, or encourage existing developers
> to work on them." [2]


This is consistent with Chris's first point.

>
> I imagine we will have to reconcile ourselves to similar problems, if
> we adopt the same structures.

Do you have suggestions as to who would make a good numpy president or 
BDFL and potentially has the time and inclination to do it, or how to 
identify and recruit such a person?

Eric

>
> Cheers,
>
> Matthew
>
> [1] 
> http://julipedia.meroh.net/2013/06/self-interview-after-leaving-netbsd.html
> [2] http://mail-index.netbsd.org/netbsd-users/2006/08/30/0016.html
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy FFT.FFT slow with certain samples

2015-08-28 Thread Eric Firing

On 2015/08/28 10:36 AM, Sebastian Berg wrote:
 If you don't mind the extra dependency or licensing and this is an issue
 for you, you can try pyfftw (there are likely other similar projects)
 which wraps fftw and does not have this problem as far as I know. It
 exposes a numpy-like interface.

Sort of; that interface returns a function, not the result.

fftw is still an fft algorithm, so it is still subject to a huge 
difference in run time depending on how the input array can be factored.

Furthermore, it gets its speed by figuring out how to optimize a 
calculation for a given size of input array.  That initial optimization 
can be very slow.  The overall speed gain is realized only when one 
saves the result of that optimization, and applies it to many 
calculations on arrays of the same size.

Eric


 - sebastian


 On Fr, 2015-08-28 at 19:13 +, Joseph Codadeen wrote:
 Great, thanks Stefan and everyone.

 From: stef...@berkeley.edu
 To: numpy-discussion@scipy.org
 Date: Fri, 28 Aug 2015 12:03:52 -0700
 Subject: Re: [Numpy-discussion] Numpy FFT.FFT slow with certain
 samples


 On 2015-08-28 11:51:47, Joseph Codadeen jdm...@hotmail.com
 wrote:
 my_1_minute_noise_with_gaps_truncated - Array len is
 2646070my_1_minute_noise_with_gaps - Array len is 2649674

 In [6]: from sympy import factorint In [7]:
 max(factorint(2646070)) Out[7]: 367 In [8]:
 max(factorint(2649674)) Out[8]: 1324837

 Those numbers give you some indication of how long the FFT will
 take to compute.

 Stéfan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Problems using add_npy_pkg_config

2015-08-12 Thread Eric Firing

I used to use scons, but I've been pretty happy with switching to waf.
(Very limited use in both cases: two relatively simple packages.)  One
of the nicest things is how light it is--no external dependencies,
everything can be included in the package itself.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] future of f2py and Fortran90+

2015-07-14 Thread Eric Firing

F2py is a great tool, but my impression is that it is being left behind 
by the evolution of Fortran from F90 onward.  This is unfortunate; it 
would be nice to be able to easily wrap new Fortran libraries.

I'm curious: has anyone been looking into what it would take to enable 
f2py to handle modern Fortran in general?  And into prospects for 
getting such an effort funded?

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-08 Thread Eric Firing

On 2015/04/08 8:09 AM, Alan G Isaac wrote:
 That analogy fails because it suggests a private conversation. This list is 
 extremely public.
 For example, I am just a user, and I am on it.  I can tell you that as a 
 long-time numpy user
 my reaction to the proposal to change indexing semantics was (i) OMG YMBFKM 
 and then
 (ii) take a breath; this too will fade away.  It is very reasonable to worry 
 that some users
 will start at the same place but them move in a different direction, and that 
 worry should
 affect how such proposals are floated and discussed.  I am personally 
 grateful that the
 idea's reception has been so chilly; it's very reassuring.

OK, so I was not sufficiently tactful when I tried to illustrate the 
real practical problem associated with a *core* aspect of numpy.  My 
intent was not to alarm users, and I apologize if I have done so. I'm 
glad you have been reassured. I know perfectly well that 
back-compatibility and stability are highly important.  What I wanted to 
do was to stimulate thought about how to handle a serious challenge to 
numpy's future--short-term, and long-term.  Jaime's PR is a very welcome 
response to that challenge, but it might not be the end of the story. 
Matthew nicely sketched out one possible scenario, or actually a range 
of scenarios.

Now, can we please get back to consideration of reasonable options? 
What sequence of steps might reduce the disconnect between numpy and the 
rest of the array-handling world?  And make it a little friendlier for 
students?

Are there *any* changes to indexing, whether by default or as an option, 
that would help?  Consider the example I started with, in which indexing 
with [1, :, array] gives results that many find surprising and hard to 
understand.  Might it make sense to *slowly* deprecate this?  Or are 
such indexing expressions actually useful?  If they are, would it be out 
of the question to have them *optionally* trigger a warning, so that 
numpy could be configured to be a little less likely to trip up a 
non-expert user?

Eric


 fwiw,
 Alan


 On 4/7/2015 9:06 PM, Nathaniel Smith wrote:
 If a grad student or junior colleague comes to you with an
 idea where you see some potentially critical flaw, do you
 yell THAT WILL NEVER WORK and kick them out of your
 office? Or, do you maybe ask a few leading questions and
 see where they go?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-08 Thread Eric Firing

On 2015/04/08 9:40 AM, Ralf Gommers wrote:
 Their proposal is not being discussed; instead that potentially useful
 discussion is being completely derailed by insisting on wanting to talk
 about changes to numpy's indexing behavior.

Good point.  That was an unintended consequence of my message.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-08 Thread Eric Firing

On 2015/04/08 10:02 AM, Alan G Isaac wrote:

 3. I admit, my students are NOT using non-boolen fancy indexing on
 multidimensional arrays. (As far as I know.)  Are yours?

Yes, one attempted to, essentially by accident.  That was in my original 
message.  Please refer back to that.  The earlier part of this thread, 
under its original name, is also relevant to your other questions.

I'm not going to discuss this further.  The thread is now closed as far 
as I am concerned.

Eric

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Eric Firing

On 2015/04/03 7:59 AM, Jaime Fernández del Río wrote:
 I have an all-Pyhton implementation of an OrthogonalIndexer class,
 loosely based on Stephan's code plus some axis remapping, that provides
 all the needed functionality for getting and setting with orthogonal
 indices.

Excellent!


 Would those interested rather see it as a gist to play around with, or
 as a PR adding an orthogonally indexable `.ix_` argument to ndarray?

I think the PR would be easier to test.

Eric


 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
 planes de dominación mundial.




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Eric Firing

On 2015/04/02 10:22 AM, josef.p...@gmail.com wrote:
 Swapping the axis when slices are mixed with fancy indexing was a
 design mistake, IMO. But not fancy indexing itself.

I'm not saying there should be no fancy indexing capability; I am saying 
that it should be available through a function or method, rather than 
via the square brackets.  Square brackets should do things that people 
expect them to do--the most common and easy-to-understand style of indexing.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Eric Firing

On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

Are you *positive* that there is no clever way to make a transition? 
It's not worth any further thought?


 If people want to implement orthogonal indexing with another method,
 by all means I might use it at some point in the future. However,
 adding even more complexity to the behaviour of the bracket slicing
 is probably not a good idea.

I'm not advocating adding even more complexity, I'm trying to think 
about ways to make it *less* complex from the typical user's standpoint.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Eric Firing

On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote:
 We probably need more traction on the should this be done? discussion
 than on the can this be done? one, the need for a reordering of the
 axes swings me slightly in favor, but I mostly don't see it yet.

As a long-time user of numpy, and an advocate and teacher of Python for 
science, here is my perspective:

Fancy indexing is a horrible design mistake--a case of cleverness run 
amok.  As you can read in the Numpy documentation, it is hard to 
explain, hard to understand, hard to remember.  Its use easily leads to 
unreadable code and hard-to-see errors.  Here is the essence of an 
example that a student presented me with just this week, in the context 
of reordering eigenvectors based on argsort applied to eigenvalues:

In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4))

In [26]: ii = np.arange(4)

In [27]: print(xx[0])
[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

In [28]: print(xx[0, :, ii])
[[ 0  4  8]
  [ 1  5  9]
  [ 2  6 10]
  [ 3  7 11]]

Quickly now, how many numpy users would look at that last expression and 
say, Of course, that is equivalent to transposing xx[0]?  And, Of 
course that expression should give a completely different result from 
xx[0][:, ii].?

I would guess it would be less than 1%.  That should tell you right away 
that we have a real problem here.  Fancy indexing can't be *read* by a 
sub-genius--it has to be laboriously figured out piece by piece, with 
frequent reference to the baffling descriptions in the Numpy docs.

So I think you should turn the question around and ask, What is the 
actual real-world use case for fancy indexing?  How often does real 
code rely on it?  I have taken advantage of it occasionally, maybe you 
have too, but I think a survey of existing code would show that the need 
for it is *far* less common than the need for simple orthogonal 
indexing.  That tells me that it is fancy indexing, not orthogonal 
indexing, that should be available through a function and/or special 
indexing attribute.  The question is then how to make that transition.

Eric





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Rewrite np.histogram in c?

2015-03-23 Thread Eric Firing

On 2015/03/23 7:36 AM, Ralf Gommers wrote:


 On Mon, Mar 23, 2015 at 2:59 PM, Daniel da Silva
 var.mail.dan...@gmail.com mailto:var.mail.dan...@gmail.com wrote:

 Hope this isn't too off-topic: but it would be very nice if
 np.histogram and np.histogram2d supported masked arrays. Is this out
 of scope for outside the numpy.ma http://numpy.ma package?


 Right now it looks like there's no histogram function at all for masked
 arrays - would be good to improve that situation.

 If it's as easy as adding to np.histogram something like:

  if isinstance(a, np.ma.MaskedArray):
  a = a.data[~a.mask]

It looks like it requires a little more than that, but not much.  For 
full support a new mask would need to be made from the logical_or of the 
a mask and the weights mask, and then used to compress both a and 
weights.

Eric


 then it makes sense to add that I think.

 Ralf



 On Mon, Mar 16, 2015 at 2:35 PM, Robert McGibbon rmcgi...@gmail.com
 mailto:rmcgi...@gmail.com wrote:

 Hi,

 It sounds like putting together a PR makes sense then. I'll try
 hacking on this a bit.

 -Robert

 On Mar 16, 2015 11:20 AM, Jaime Fernández del Río
 jaime.f...@gmail.com mailto:jaime.f...@gmail.com wrote:

 On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer
 jerome.kief...@esrf.fr mailto:jerome.kief...@esrf.fr wrote:

 On Mon, 16 Mar 2015 06:56:58 -0700
 Jaime Fernández del Río jaime.f...@gmail.com
 mailto:jaime.f...@gmail.com wrote:

  Dispatching to a different method seems like a no brainer 
 indeed. The
  question is whether we really need to do this in C.

 I need to do both unweighted  weighted histograms and
 we got a factor 5 using (simple) cython:
 it is in the proceedings of Euroscipy, last year.
 http://arxiv.org/pdf/1412.6367.pdf


 If I read your paper and code properly, you got 5x faster,
 mostly because you combined the weighted and unweighted
 histograms into a single search of the array, and because
 you used an algorithm that can only be applied to equal-
 sized bins, similarly to the 10x speed-up Robert was reporting.

 I think that having a special path for equal sized bins is a
 great idea: let's do it, PRs are always welcome!
 Similarly, getting the counts together with the weights
 seems like a very good idea.

 I also think that writing it in Python is going to take us
 80% of the way there: most of the improvements both of you
 have reported are not likely to be coming from the language
 chosen, but from the algorithm used. And if C proves to be
 sufficiently faster to warrant using it, it should be
 confined to the number crunching: I don;t think there is any
 point in rewriting argument parsing in C.

 Also, keep in mind `np.histogram` can now handle arrays of
 just about **any** dtype. Handling that complexity in C is
 not a ride in the park. Other functions like `np.bincount`
 and `np.digitize` cheat by only handling `double` typed
 arrays, a luxury that histogram probably can't afford at
 this point in time.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale
 en sus planes de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fix masked arrays to properly edit views

2015-03-14 Thread Eric Firing

On 2015/03/14 1:02 PM, John Kirkham wrote:
 The sample case of the issue (
 https://github.com/numpy/numpy/issues/5558 ) is shown below. A proposal
 to address this behavior can be found here (
 https://github.com/numpy/numpy/pull/5580 ). Please give me your feedback.


 I tried to change the mask of `a` through a subindexed view, but was
 unable. Using this setup I can reproduce this in the 1.9.1 version of NumPy.

  import numpy as np

  a = np.arange(6).reshape(2,3)
  a = np.ma.masked_array(a, mask=np.ma.getmaskarray(a), shrink=False)

  b = a[1:2,1:2]

  c = np.zeros(b.shape, b.dtype)
  c = np.ma.masked_array(c, mask=np.ma.getmaskarray(c), shrink=False)
  c[:] = np.ma.masked

 This yields what one would expect for `a`, `b`, and `c` (seen below).

   masked_array(data =
 [[0 1 2]
  [3 4 5]],
mask =
 [[False False False]
  [False False False]],
   fill_value = 99)

   masked_array(data =
 [[4]],
mask =
 [[False]],
   fill_value = 99)

   masked_array(data =
 [[--]],
mask =
 [[ True]],
   fill_value = 99)

 Now, it would seem reasonable that to copy data into `b` from `c` one
 can use `__setitem__` (seen below).

   b[:] = c

 This results in new data and mask for `b`.

   masked_array(data =
 [[--]],
mask =
 [[ True]],
   fill_value = 99)

 This should, in turn, change `a`. However, the mask of `a` remains
 unchanged (seen below).

   masked_array(data =
 [[0 1 2]
  [3 0 5]],
mask =
 [[False False False]
  [False False False]],
   fill_value = 99)



I agree that this behavior is wrong.  A related oddity is this:

In [24]: a = np.arange(6).reshape(2,3)
In [25]: a = np.ma.array(a, mask=np.ma.getmaskarray(a), shrink=False)
In [27]: a.sharedmask
True
In [28]: a.unshare_mask()
In [30]: b = a[1:2, 1:2]
In [31]: b[:] = np.ma.masked
In [32]: b.sharedmask
False
In [33]: a
masked_array(data =
  [[0 1 2]
  [3 -- 5]],
  mask =
  [[False False False]
  [False  True False]],
fill_value = 99)

It looks like the sharedmask property simply is not being set and 
interpreted correctly--a freshly initialized array has sharedmask True; 
and after setting it to False, changing the mask of a new view *does* 
change the mask in the original.

Eric


 Best,
 John


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy pickling problem - python 2 vs. python 3

2015-03-06 Thread Eric Firing

On 2015/03/06 1:29 PM, Julian Taylor wrote:
 I think the ship for a warning has long sailed. At this point its
 probably more an annoyance for python3 users and will not prevent many
 more python2 users from saving files that can't be loaded into python3.

The point of a warning is that anything that relies on pickles is 
fundamentally unreliable in the long term.  It's potentially a surprise 
that the npz format relies on pickles.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy pickling problem - python 2 vs. python 3

2015-03-06 Thread Eric Firing

On 2015/03/06 10:23 AM, Pauli Virtanen wrote:
 06.03.2015, 20:00, Benjamin Root kirjoitti:
 A slightly different way to look at this is one of sharing data. If I am
 working on a system with 3.4 and I want to share data with others who may
 be using a mix of 2.7 and 3.3 systems, this problem makes npz format much
 less attractive.

 pickle is used in npy files only if there are object arrays in them.
 Of course, savez could just decline saving object arrays.

Or issue a prominent warning.

Eric

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-08 Thread Eric Firing

On 2015/02/08 12:43 PM, josef.p...@gmail.com wrote:


 For me the main behavior I had to adjust to was loosing a dimension in
 any reduce operation, mean, sum, ...

 if x is 2d
 x - x.mean(1)
 we loose a dimension, and it doesn't broadcast in the right direction

Though you can use:

x_demeaned = x - np.mean(x, axis=1, keepdims=True)


 x - x.mean(0)
 perfect, no `repeat` needed, it just broadcasts the way we need.

 Josef

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Datetime again

2015-01-28 Thread Eric Firing

On 2015/01/28 6:29 PM, Charles R Harris wrote:


 And as for The 64 bits of long long really isn't enough and leads
 to all sorts of compromises. not long enough for what? I've always
 thought that what we need is the ability to set the epoch. Does
 anyone ever need picoseconds since 100 years ago? And if they did,
 we'd be in a heck of a mess with leap seconds and all that anyway.


 I was thinking elapsed time. Nanoseconds can be rather crude for that
 depending on the measurement. Of course, such short times aren't going
 to come from the system clock, but data collected in other ways,
 interference between light pulses over microscopic distances for
 instance. Such data is likely acquired as, or computed, from simple
 numbers with a unit, which gets us back to the numpy version. But that
 complicates the heck out of things when you want to start adding times
 in different units.

Chuck,

For any kind of data like that, I fail to see why any special numpy time 
type is needed at all.  Wouldn't the user just keep elapsed time as a 
count, or floating point number, in whatever units the instrument spits 
out?  Why does it need to be treated in a different way from any other 
numeric data?  We don't have special types for length. It seems to me 
that numpy's present experimental datetime64 type has already fallen 
into the trap of overengineering--trying to be too many things to too 
many people.  The main reason for having a special datetime type is to 
deal with the calendar mess, and conventional hours-minutes-seconds 
time.  For very short time intervals, all that is irrelevant.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] EDF+ specification

2015-01-20 Thread Eric Firing

Nathaniel,

I don't know what sequence of wrong button pushes led to this, but the 
message was intended for Io Flament.  Sorry for the puzzling disruption!

Eric

On 2015/01/20 1:17 PM, Nathaniel Smith wrote:
 On Tue, Jan 20, 2015 at 10:51 PM, Eric Firing efir...@hawaii.edu wrote:
 http://www.edfplus.info/specs/edfplus.html#additionalspecs

 Io,  Is this the file format you have?

 Sorry, I don't quite understand the question!

 Maybe you're looking for

 https://github.com/breuderink/eegtools
 https://github.com/rays/pyedf
 https://bitbucket.org/cleemesser/python-edf/

 ...?


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] EDF+ specification

2015-01-20 Thread Eric Firing

http://www.edfplus.info/specs/edfplus.html#additionalspecs

Io,  Is this the file format you have?

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Detect if array has been transposed

2014-10-12 Thread Eric Firing

On 2014/10/12, 8:29 AM, Pauli Virtanen wrote:
 12.10.2014, 20:19, Mads Ipsen kirjoitti:
 Is there any way for me to detect (on the Python side) that transpose()
 has been invoked on the matrix, and thereby only do the copy operation
 when it really is needed?

 The correct way to do this is to, either:

 In your C code check PyArray_IS_C_CONTIGUOUS(obj) and raise an error if
 it is not. In addition, on the Python side, check for
 `a.flags.c_contiguous` and make a copy if it is not.

 OR

 In your C code, get an handle to the array using PyArray_FromANY (or
 PyArray_FromOTF) with NPY_ARRAY_C_CONTIGUOUS requirement set so that it
 makes a copy when necessary.


or let numpy handle it on the python side:

foo(numpy.ascontiguousarray(a))



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 'None' comparison FutureWarning

2014-09-21 Thread Eric Firing

On 2014/09/21, 11:10 AM, Demitri Muna wrote:
 Hi,

 I just encountered the following in my code:

 FutureWarning: comparison to `None` will result in an elementwise object
 comparison in the future.

 I'm very concerned about this. This is a very common programming pattern
 (lazy loading):

 class A(object):
  def __init__(self):
  self._some_array = None

  @property
  def some_array(self):
  if self._some_array == None:
  # perform some expensive setup of array
  return self._some_array

 It seems to me that the new behavior will break this pattern. I think
 that redefining the == operator is a little too aggressive here. It
 strikes me as very nonstandard and not at all obvious to someone reading
 the code that the comparison is a very special case for numpy objects.
 Unless there's some aspect I'm missing here, I think an element-wise
 comparator should be more explicit.


I think what you are missing is that the standard Python idiom for this 
use case is if self._some_array is None:.  This will continue to work, 
regardless of whether the object being checked is an ndarray or any 
other Python object.

Eric


 Cheers,
 Demitri

 _
 Demitri Muna

 Department of Astronomy
 Le Ohio State University

 http://trillianverse.org
 http://scicoder.org





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-06 Thread Eric Firing

On 2014/07/06, 11:43 AM, Nathaniel Smith wrote:
 On Sun, Jul 6, 2014 at 9:35 PM, Daniel da Silva
 var.mail.dan...@gmail.com wrote:
 The idea is that there be a short-hand for creating arrays as there is for
 matrices:

np.mat('.2 .7 .1; .3 .5 .2; .1 .1 .9')

 It was suggested in GitHub issue #4817 in light that it would be beneficial
 to beginners and to presenters during demonstrations.  In GitHub pull
 request #484, I implemented this as the np.arr function.

 Does anyone have any feedback on the API details? Some examples from my
 implementation follow.

 np.arr('3; 4; 5')
  array([[3],
 [4],
 [5]])

 np.arr('3; 4; 5', dtype=float)
  array([[ 3.],
 [ 4.],
 [ 5.]])

 np.arr('1 0 0; 0 1 0; 0 0 1')
  array([[1, 0, 0],
 [0, 1, 0],
 [0, 0, 1]])

 np.arr('4, 5; 6, 7')
  array([[4, 5],
 [6, 7]])

 It occurs to me that np.mat always returns a 2d matrix, but for arrays
 there are more options.

 What should np.arr('1 2 3') return? a 1d array or a 2d row vector?

I would say 1d array.  This is numpy, not numpy.matrix.

 (Maybe np.arr('1 2 3;') should give the row-vector?)

Yes, it is reasonable that a semicolon should trigger 2d.


 Should there be some way to write 3d or higher-d arrays?

No, there should not.  This is for quick demos and that sort of thing. 
It is not a substitute for np.array().  (I'm not entirely convinced 
np.arr() is a good idea at all; but if it is, it must be kept simple.)

A possible downside for beginners is that this might delay their 
understanding that the commas are needed for np.array([1, 2, 3]).

Eric


 -n


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-06 Thread Eric Firing

On 2014/07/06, 4:27 PM, Alexander Belopolsky wrote:

 On Sun, Jul 6, 2014 at 6:06 PM, Eric Firing efir...@hawaii.edu
 mailto:efir...@hawaii.edu wrote:

   (I'm not entirely convinced
 np.arr() is a good idea at all; but if it is, it must be kept simple.)


 If you are going to introduce this functionality, please don't call it
 np.arr.

 Right now, np.atab presents you with a whopping 53 completion choices.
   Adding r, narrows that to 21, but np.arrtab completes to np.array
 right away.  Please don't introduce another bump in this road.

 Namespaces are one honking great idea -- let's do more of those!

 I would suggest calling it something like np.array_simple or
 np.array_from_string, but the best choice IMO, would be
 np.ndarray.from_string (a static constructor method).


I think the problem is that this defeats the point: minimizing typing 
when doing an off-the-cuff demo or test.  I don't know that this use 
case justifies the clutter, regardless of what it is called; but 
evidently there is some demand for it.

Eric



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] segfault from scipy.io.netcdf with scipy-0.14 numpy-0.18

2014-05-08 Thread Eric Firing

On 2014/05/07 11:26 PM, Robert McGibbon wrote:
 Hey all,

 The travis tests for a library I work on just stopped working, and I
 tracked down the bug to the following test case. The file
 MDTraj/testing/reference/mdcrd.nc http://mdcrd.nc is a netcdf3 file
 in our repository
 (https://github.com/rmcgibbo/mdtraj/tree/master/MDTraj/testing/reference).

 this script:

 |conda install --yes scipy==0.13 numpy==1.7 --quiet
 python -c 'importscipy.io  http://scipy.io; print 
 scipy.io.netcdf.netcdf_file(MDTraj/testing/reference/mdcrd.nc  
 http://mdcrd.nc).variables[coordinates][:].sum()'

 conda install --yes scipy==0.14 numpy==1.8 --quiet
 python -c 'importscipy.io  http://scipy.io; print 
 scipy.io.netcdf.netcdf_file(MDTraj/testing/reference/mdcrd.nc  
 http://mdcrd.nc).variables[coordinates][:].sum()'|

 works on scipy==0.13 numpy==1.7, but segfaults on scipy==0.14
 numpy==1.8. I got the segfault on both linux and osx.

The netcdf module in scipy is a version of pupynere; maybe it needs to 
be updated.  I can reproduce the segfault using scipy, but not with the 
current version of pupynere, which you can install using pip.

Eric



 I tried compiling a new version of numpy from source with debug symbols
 using `python setup.py build_ext -g install`, but couldn't get a useful
 traceback.

 $ gdb --core=core
 (gdb) bt
 #0  0x7fd4f7887b18 in ?? ()
 #1  0x7fd4f786ecc6 in ?? ()
 #2  0x in ?? ()


 Anyone have any advice for tracking this down?

 -Robert


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] List of arrays failing index(), remove() etc

2014-05-07 Thread Eric Firing

On 2014/05/07 2:14 PM, mfm24 wrote:
 I'm having a problem I haven't seen elsewhere (and apologies if it has
 been answered before).

 I see the following behavior (copied verbatim from a python session):

 Python 2.7.4 (default, Apr  6 2013, 19:55:15) [MSC v.1500 64 bit (AMD64)] on 
 win32
 Type help, copyright, credits or license for more information.
 import numpy as np
 x=[[np.zeros(10)] for i in range(10)]
 x.index(x[0])
 0
 x.index(x[1])
 Traceback (most recent call last):
File , line 1, in
 ValueError: The truth value of an array with more than one element is 
 ambiguous. Use a.any() or a.all()
 x[1].append(np.zeros(10))
 x.index(x[1])
 1

 Any ideas why I see a ValueError when trying to find the index of a list
 containing a single ndarray?

In the first example, indexing with 0, it checks the first entry in x, 
finds that it *is* the target, and so returns the first index, 0.

In the second case, indexing with 1, it checks the first entry in x, 
finds that it is *not* the same object, so it checks to see if it has 
the same contents.  This leads it to compare two ndarrays for equality, 
which leads to the ValueError.

Eric


 -Matt
 
 View this message in context: List of arrays failing index(), remove()
 etc
 http://numpy-discussion.10968.n7.nabble.com/List-of-arrays-failing-index-remove-etc-tp37544.html
 Sent from the Numpy-discussion mailing list archive
 http://numpy-discussion.10968.n7.nabble.com/ at Nabble.com.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Resolving the associativity/precedence debate for @

2014-03-22 Thread Eric Firing

On 2014/03/22 8:13 AM, Nathaniel Smith wrote:
 Hi all,

 After 88 emails we don't have a conclusion in the other thread (see
 [1] for background). But we have to come to some conclusion or another
 if we want @ to exist:-). So I'll summarize where the discussion
 stands and let's see if we can find some way to resolve this.

In case a vote from a previously non-voting reader helps:

I think the case for same-left, as you state it, is strong; it's simple 
and easy to remember, and *this* *matters*. A *strong* argument would be 
needed to override this consideration, and I haven't seen any such 
strong argument.  The basic advice to users is: be explicit--use 
parentheses as needed to show both the interpreter and readers of your 
code how you want the expression to be evaluated.  Relying on precedence 
and associativity works only when the rules are well established by 
convention, and the expression is quite simple.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] python array

2014-03-14 Thread Eric Firing

On 2014/03/13 9:09 PM, Sudheer Joseph wrote:
 Dear Oslen,

 I had  a detailed look at the example you send and points I got were below

 a = np.arange(-8, 8).reshape((4, 4))
 b = ma.masked_array(a, mask=a  0)


 Out[33]: b[b4]
 masked_array(data = [-- -- -- -- -- -- -- -- 0 1 2 3],
   mask = [ True  True  True  True  True  True  True  True False 
 False False False],
 fill_value = 99)
 In [34]: b[b4].shape
 Out[34]: (12,)
 In [35]: b[b4].data
 Out[35]: array([-8, -7, -6, -5, -4, -3, -2, -1,  0,  1,  2,  3])

 This shows while numpy can do the bolean operation and list the data meeting 
 the criteria( by masking the data further), it do not actually allow us get 
 the count of data that meets the crieteria. I was interested in count. 
 Because my objective was to find out how many numbers in the grid fall under 
 different catagory.( =4 , 4  =8 , 8=10) etc. and find the percentage of 
 them.

   Is there a way to get the counts correctly ? that is my botheration now !!

Certainly.  If all you need are statistics of the type you describe, 
where you are working with a 1-D array, then extract the unmasked values 
into an ordinary ndarray, and work with that:

a = np.random.randn(100)
am = np.ma.masked_less(a, -0.2)
print am.count()  # number of masked values
a_nomask = am.compressed()
print type(a_nomask)
print a_nomask.shape

# number of points with value less than 0.5:
print (a_nomask  0.5).sum()
# (Boolean True is 1)

# Or if you want the actual array of values, not just the count:
a_nomask[a_nomask  0.5]

Eric




 with best regards,
 Sudheer

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] surprising behavior of np.asarray on masked arrays

2013-12-05 Thread Eric Firing

On 2013/12/05 5:14 PM, Faraz Mirzaei wrote:
 Hi,

 If I pass a masked array through np.asarray, I get original unmasked array.

 Example:

 test = np.array([[1, 0], [-1, 3]])

 testMasked = ma.masked_less_equal(test, 0)


 print testMasked

 [[1 --]

   [-- 3]]


 print testMasked.fill_value

 99


 print np.asarray(testMasked)

 [[ 1 0]

   [-1 3]]


 Is this behavior intentional? How does the np.asarray access the
 original masked values? Shouldn't the masked values be at least filled
 with fill_value?

It might be nice, but it's not the way it is.  If you want to preserve 
masked arrays, use np.asanyarray() instead of np.asarray().  If you want 
to end up with filled ndarrays, use np.ma.filled().

Eric



 Thanks,


 Faraz



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Masked arrays: Rationale for False convention

2013-09-30 Thread Eric Firing

On 2013/09/30 4:05 PM, josef.p...@gmail.com wrote:
 On Mon, Sep 30, 2013 at 9:38 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Mon, Sep 30, 2013 at 7:05 PM, Ondřej Čertík ondrej.cer...@gmail.com
 wrote:

 Hi,

 What is the rationale for using False in 'mask' for elements that
 should be included?

 http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html

 As opposed to using True for elements that should be included, which
 is what I was intuitively expecting when I started using the masked
 arrays. This True convention also happens to be the one used in
 Fortran, see e.g.:

 http://gcc.gnu.org/onlinedocs/gfortran/SUM.html

 So it's confusing why NumPy would chose a False convention. Could it
 be, that NumPy views 'mask' as opacity? Then it would make sense to
 use True to make a value 'opaque'.


 There was a lengthy discussion of this point back when the NA work was done.
 You might be able to find the thread with a search.

 As to why it is as it is, I suspect it is historical consistency. Pierre
 wrote the masked array package for numpy, but it may very well go back to
 the masked array package implemented for Numeric.

 I don't know ancient history, but I thought it's natural. (Actually,
 I never thought about it.)

 I always thought `mask` indicates the masked (invalid, hidden)
 values, and masked arrays mask the missing values.

Exactly.  It is also consistent with the C and Unix convention of 
returning 0 on success and 1, or a non-zero error code on failure.  In a 
similar vein, it works nicely with bit-mapped quality control flags, 
etc.  When nothing is flagged, the value is good, and consequently not 
masked out.

Eric


 http://en.wikipedia.org/wiki/Masking_tape

 Josef


 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Masked arrays: Rationale for False convention

2013-09-30 Thread Eric Firing

On 2013/09/30 4:57 PM, Ondřej Čertík wrote:
 On Mon, Sep 30, 2013 at 8:29 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2013/09/30 4:05 PM, josef.p...@gmail.com wrote:
 On Mon, Sep 30, 2013 at 9:38 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Mon, Sep 30, 2013 at 7:05 PM, Ondřej Čertík ondrej.cer...@gmail.com
 wrote:

 Hi,

 What is the rationale for using False in 'mask' for elements that
 should be included?

 http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html

 As opposed to using True for elements that should be included, which
 is what I was intuitively expecting when I started using the masked
 arrays. This True convention also happens to be the one used in
 Fortran, see e.g.:

 http://gcc.gnu.org/onlinedocs/gfortran/SUM.html

 So it's confusing why NumPy would chose a False convention. Could it
 be, that NumPy views 'mask' as opacity? Then it would make sense to
 use True to make a value 'opaque'.


 There was a lengthy discussion of this point back when the NA work was 
 done.
 You might be able to find the thread with a search.

 As to why it is as it is, I suspect it is historical consistency. Pierre
 wrote the masked array package for numpy, but it may very well go back to
 the masked array package implemented for Numeric.

 I don't know ancient history, but I thought it's natural. (Actually,
 I never thought about it.)

 I always thought `mask` indicates the masked (invalid, hidden)
 values, and masked arrays mask the missing values.

 Exactly.  It is also consistent with the C and Unix convention of
 returning 0 on success and 1, or a non-zero error code on failure.  In a
 similar vein, it works nicely with bit-mapped quality control flags,
 etc.  When nothing is flagged, the value is good, and consequently not
 masked out.

 I see, that makes sense. So to remember this, the rule is:

 Specify elements that you want to get masked using True in 'mask'.

 But why do I need to invert the mask when I want to see the valid elements:

 In [1]: from numpy import ma

 In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])

 In [3]: a
 Out[3]:
 masked_array(data = [1 2 -- 4],
   mask = [False False  True False],
 fill_value = 99)


 In [4]: a[~a.mask]
 Out[4]:
 masked_array(data = [1 2 4],
   mask = [False False False],
 fill_value = 99)


 I would find natural to write [4] as a[a.mask]. This is when it gets 
 confusing.

There is no getting around it; each of the two possible conventions has 
its advantages.  But try this instead:

In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])

In [3]: a.compressed()
Out[3]: array([1, 2, 4])


I do occasionally need a goodmask which is the inverse of a.mask, but 
not very often; and when I do, needing to invert a.mask doesn't bother me.

Eric


 For example in Fortran, one does:

 integer :: a(4) = [1, 2, 3, 4]
 logical :: mask(4) = [.true., .true., .false., .true.]
 print *, a
 print *, pack(a, mask)

 and it prints:

 1   2   3   4
 1   2   4

 So the behavior of mask when used as an index to select elements from
 an array is identical to NumPy --- True means include the element,
 False means exclude it.

 Ondrej
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Strange behavior with boolean slices...

2013-08-25 Thread Eric Firing

On 2013/08/25 2:30 PM, Cera, Tim wrote:
 I have done this before, but am now really confused.

 Created an array 'day' specifying the 'f' type

 In [29]: day
 Out[29]: array([ 5.,  5.], dtype=float32)

 # Have a mask...
 In [30]: mask
 Out[30]: array([ True, False], dtype=bool)

 # So far, so good...
 In [31]: day[mask]
 Out[31]: array([ 5.], dtype=float32)

 In [32]: day[mask] = 10

 # What?
 In [33]: day
 Out[33]: array([ 10.,  10.], dtype=float32)

I'm not getting that with 1.7.0:
In [2]: np.__version__
Out[2]: '1.7.0'

In [3]: mask = np.array([True, False], dtype=bool)

In [4]: day = np.array([5, 5], dtype=np.float32)

In [5]: day
Out[5]: array([ 5.,  5.], dtype=float32)

In [6]: mask
Out[6]: array([ True, False], dtype=bool)

In [7]: day[mask]
Out[7]: array([ 5.], dtype=float32)

In [8]: day[mask] = 10

In [9]: day
Out[9]: array([ 10.,   5.], dtype=float32)

Eric




 So I created an integer array 'a'

 In [38]: a
 Out[38]: array([11,  1])

 In [39]: a[mask]
 Out[39]: array([11])

 In [40]: a[mask] = 12

 # This is what I expect.
 In [41]: a
 Out[41]: array([12,  1])

 Am I missing something?  Is this supposed to happen?

 Version 1.7.1.

 Kindest regards,
 Tim
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Strange behavior with boolean slices...

2013-08-25 Thread Eric Firing

On 2013/08/25 2:30 PM, Cera, Tim wrote:
 I have done this before, but am now really confused.

 Created an array 'day' specifying the 'f' type

 In [29]: day
 Out[29]: array([ 5.,  5.], dtype=float32)

 # Have a mask...
 In [30]: mask
 Out[30]: array([ True, False], dtype=bool)

 # So far, so good...
 In [31]: day[mask]
 Out[31]: array([ 5.], dtype=float32)

 In [32]: day[mask] = 10

 # What?
 In [33]: day
 Out[33]: array([ 10.,  10.], dtype=float32)


I don't get it with 1.7.1, either:

In [2]: np.__version__
Out[2]: '1.7.1'

In [3]: %paste
mask = np.array([True, False], dtype=bool)
day = np.array([5, 5], dtype=np.float32)
day
mask
day[mask]
day[mask] = 10
day
## -- End pasted text --
Out[3]: array([ 10.,   5.], dtype=float32)

My 1.7.0 example is on a Mac, the 1.7.1 is on a Linux virtual machine, 
both 64-bit.

Eric


 So I created an integer array 'a'

 In [38]: a
 Out[38]: array([11,  1])

 In [39]: a[mask]
 Out[39]: array([11])

 In [40]: a[mask] = 12

 # This is what I expect.
 In [41]: a
 Out[41]: array([12,  1])

 Am I missing something?  Is this supposed to happen?

 Version 1.7.1.

 Kindest regards,
 Tim
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] strange behavior of variable

2013-08-18 Thread Eric Firing

On 2013/08/17 9:49 PM, Sudheer Joseph wrote:
 Hi,
   I have defined a small function to find the n maximum values
 of an array as below. With in it I assign the input array to a second
 array and temporarily make the array location after first iteration as
 nan. I expected this temporary change to be limited to the second
 variable. However my initial variable gets modified. Can any one through
 some light to what is happening here?. In case of matlab this logic works.

 ##
 #FUNCTION maxn
 ##
 import numpy as np
 def max_n(a,n):
   b=a

This is not making b a copy of a, it is simply making it an alias 
for it.  To make it a copy you could use b = a[:], or b = a.copy()

It sounds like you don't really need a function, however.  Try this:

# test data:
a = np.random.randn(10)
n = 2

# One-line solution:
biggest_n = np.sort(a)[-n:]

print a
print biggest_n

If you want them ordered from largest to smallest, just reverse the list:

biggest_n = biggest_n[::-1]

Eric


   result=[]
   for i in np.arange(1,n+1):
   mxidx=np.where(b==max(b))
   result.append(mxidx)
   b[mxidx]=np.nan
   result=np.ravel(result)
   return(result)

 ### TEST
 In [8]: x=np.arange(float(0),10)

 In [9]: max
 maxmax_n

 In [9]: max_n(x,2)
 Out[9]: array([9, 8])

 In [10]: x
 Out[10]: array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,  nan,  nan])
 ***
 Sudheer Joseph
 Indian National Centre for Ocean Information Services
 Ministry of Earth Sciences, Govt. of India
 POST BOX NO: 21, IDA Jeedeemetla P.O.
 Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
 Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
 E-mail:sjo.in...@gmail.com;sudheer.jos...@yahoo.com
 Web- http://oppamthadathil.tripod.com
 ***


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] bug fixes: which branch?

2013-06-16 Thread Eric Firing

What is the preferred strategy for handling bug fix PRs?  Initial fix on 
master, and then a separate PR to backport to v1.7.x?  Or the reverse? 
It doesn't look like v1.7.x is being merged into master regularly, so 
the matplotlib pattern (fix on maintenance, merge maintenance into 
master) seems not to be used here.

Thanks.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] please close 611, 629, 2490, 2264

2013-06-16 Thread Eric Firing

Github issues 611, 629, and 2490 are duplicates.  611 included patches 
with a test and a fix, both of which were committed long ago, so all 
three issues should be closed.

Please see my comment on 2264 as to why that should be closed.

On 1417, please remove the component:numpy.ma label and add the 
component:matrixlib label.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] saving 3d array

2013-06-15 Thread Eric Firing

On 2013/06/15 6:06 AM, Pierre GM wrote:

 On Jun 15, 2013, at 17:35 , Matthew Brett matthew.br...@gmail.com wrote:

 Hi,

 On Sat, Jun 15, 2013 at 2:51 PM, Sudheer Joseph
 sudheer.jos...@yahoo.com wrote:

 Thank you very much for this tip.
 Is there a typical way to save masked and the rest separately?. Not much 
 familiar with array handling in numpy.

 I don't use masked array myself, but it looks like it would be something 
 like:

 eof1_unmasked = np.array(eof1)
 eof1_mask = eof1.mask

 then you could save those two.  Maybe a more maskey person could comment?

 Instead of `eof1_unmasked=np.array(eof1)`, you could do `eof1_unmasked = 
 eof1.data`. The '.data' attribute points to  a view of the masked array as a 
 simple ndarray.

 You may also wanna try `eof1.torecords()` that will return a structured array 
 with dtype `[('_data',type_of_eof1),('_mask', bool)]`.

For automated saving and restoring, try this:

http://currents.soest.hawaii.edu/hgstage/pycurrents/file/686c2802a6c4/file/npzfile.py

Eric

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.filled, again

2013-06-14 Thread Eric Firing

On 2013/06/14 5:15 AM, Alan G Isaac wrote:
 On 6/14/2013 9:27 AM, Aldcroft, Thomas wrote:
 If I just saw np.values(..) in some code I would never guess what it is 
 doing from the name

 That suggests np.fromvalues.
 But more important than the name I think
 is allowing broadcasting of the values,
 based on NumPy's broadcasting rules.
 Broadcasting a scalar is then a special case,
 even if it is the case that has dominated this thread.

True, but this looks to me like mission creep.  All of this fuss is 
about replacing two lines of user code with a single line.  If it can't 
be kept simple, both in implementation and in documentation, it 
shouldn't be done at all.  I'm not necessarily opposed to your 
suggestion, but I'm skeptical.

Eric


 Alan Isaac

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] NA, and replacement or reimplimentation of np.ma

2013-06-14 Thread Eric Firing

On 2013/06/14 7:22 AM, Nathaniel Smith wrote:
 On Wed, Jun 12, 2013 at 7:43 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2013/06/12 2:10 AM, Nathaniel Smith wrote:
 Personally I think that overloading np.empty is horribly ugly, will
 continue confusing newbies and everyone else indefinitely, and I'm
 100% convinced that we'll regret implementing such a warty interface
 for something that should be so idiomatic. (Unfortunately I got busy
 and didn't actually say this in the previous thread though.) So I
 think we should just merge the PR as is. The only downside is the
 np.ma inconsistency, but, np.ma is already inconsistent (cf.
 masked_array.fill versus masked_array.filled!), somewhat deprecated,

 somewhat deprecated?  Really?  Since when?  By whom?  Replaced by what?

 Sorry, not trying to start a fight, just trying to summarize the
 situation. As far as I can tell:

 Despite heroic efforts on the part of its authors, numpy.ma has a
 number of weird quirks (masked data can still trigger invalid value
 errors), misfeatures (hard versus soft masks), and just plain old pain
 points (ongoing issues with whether any given operation will respect
 or preserve the mask).

 It's been in deep maintenance mode for some time; we merge the
 occasional bug fix that people send in, and that's it. (To be fair,
 numpy as a whole is fairly slow-moving, but numpy.ma still gets much
 less attention.)

 Even if there were active maintainers, no-one really has any idea how
 to fix any of the problems above; they're not so much bugs as
 intrinsic limitations of the design.

 Therefore, my impression is that a majority (not all, but a majority)
 of numpy developers strongly recommend against the use of numpy.ma in
 new projects.

 I could be wrong! And I know there's nothing to really replace it. I'd
 like to fix that. But I think semi-deprecated is not an unfair
 shorthand for the above.

 (I'll even admit that I'd *like* to actually deprecate it. But what I
 mean by that is, I don't think it's possible to fix it to the point
 where it's actually a solid/clean/robust library, so I'd like to reach
 a point where everyone who's currently using it is happier switching
 to something else and is happy to sign off on deprecating it.)


Nathaniel,

I've been pondering when to bring this up again, but you did it for me, 
so here it is with a new title for the thread.  Maybe it will be short 
and sweet, maybe not.

I think we can agree that there is major interest in having good numpy 
support for one or more styles of missing/masked values.  You might not 
agree, but I will assert that the style of support provided by np.ma is 
*very* useful; it serves a real purpose in working code.  We do agree 
that np.ma has problems.  It is not at all clear to me, however, that 
those problems cannot or should not be fixed.  Even if they can't, I 
don't think they are so severe that it is wise to try to kill off np.ma 
*before* there is a good replacement.

In the NA branch, an attempt was made to lay the groundwork for solid 
missing/masked support.  I did not agree with every design aspect, but I 
thought it was nevertheless good as groundwork, and could be used to 
greatly improve np.ma, to provide a different style of support for those 
who require it, and perhaps to lead over the very long term to a 
withering away of the need for np.ma.

Some of the groundwork from the NA branch survived, but most of it is 
sitting off to the side.

Is there any way to revive this line of development?  To satisfy the 
needs of people coming from the R world *and* of people for whom np.ma 
is, despite its warts, an important tool?  This seems to me to be the 
single biggest area where numpy needs development.

It looks like this problem needs dedicated resources: a grant, a major 
corporate effort, or both.

Numpy is central to python in science, but it doesn't seem to have a 
corresponding level of direction and support.

Eric


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] time to revisit NA/ma ideas

2013-06-14 Thread Eric Firing

A nice summary of the discussions from a year ago is here:

http://www.numpy.org/NA-overview.html

It provides food for thought.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.filled, again

2013-06-13 Thread Eric Firing

On 2013/06/13 10:36 AM, Benjamin Root wrote:

 On Thu, Jun 13, 2013 at 9:36 AM, Aldcroft, Thomas
 aldcr...@head.cfa.harvard.edu mailto:aldcr...@head.cfa.harvard.edu
 wrote:




 On Wed, Jun 12, 2013 at 2:55 PM, Eric Firing efir...@hawaii.edu
 mailto:efir...@hawaii.edu wrote:

 On 2013/06/12 8:13 AM, Warren Weckesser wrote:
   That's why I suggested 'filledwith' (add the underscore if
 you like).
   This also allows a corresponding masked implementation,
 'ma.filledwith',
   without clobbering the existing 'ma.filled'.

 Consensus on np.filled? absolutely not, you do not have a consensus.

 np.filledwith or filled_with: fine with me, maybe even with
 everyone--let's see.  I would prefer the underscore version.


 +1 on np.filled_with.  It's unique the meaning is extremely obvious.
   We do use np.ma.filled in astropy so a big -1 on deprecating that
 (which would then require doing numpy version checks to get the
 right method).  Even when there is an NA dtype the numpy.ma
 http://numpy.ma users won't go away anytime soon.


 I like np.filled_with(), but just to be devil's advocate, think of the
 syntax:

 np.filled_with((10, 24), np.nan)

 As I read that, I am filling the array with (10, 24), not NaNs.  Minor
 issue, for sure, but just thought I raise that.

 -1 on deprecation of np.ma.filled().  -1 on np.filled() due to collision
 with np.ma http://np.ma (both conceptually and programatically).

 np.values() might be a decent alternative.

 Cheers!
 Ben Root

Even if he is representing the devil, Ben raises a good point.  To 
summarize, the most recent set of suggestions that seem not to have been 
completely shot down include:

np.filled_with((10, 24), np.nan)
np.full((10, 24), np.nan)  # analogous to np.empty
np.values((10, 24), np.nan)# seems clear, concise
np.initialized((10, 24), np.nan)   # a few more characters, but
#  seems clear to me.

Personally, I like all of the last three better than the first.

Eric


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.filled, again

2013-06-12 Thread Eric Firing

On 2013/06/12 2:10 AM, Nathaniel Smith wrote:
 Hi all,

 It looks like we've gotten a bit confused and need to untangle
 something. There's a PR to add new functions 'np.filled' and
 'np.filled_like':
https://github.com/numpy/numpy/pull/2875
 And there was a discussion about this on the list back in January:
http://thread.gmane.org/gmane.comp.python.numeric.general/52763

 I think a reasonable summary of the opinions in the thread are:
 - This functionality is great, ...
 - ...but we can't call it 'np.filled' because there's also
 'np.ma.filled' which does something else...
 - ...but there really aren't any better names...

How about 'np.initialized'?

 - ...so we should overload np.empty, like: 'np.empty(shape, fill=value)'

 In the mean time the original submitter has continued puttering along
 polishing the original patch, and it's ready to merge... except it's
 still the original interface, somehow the thread discussion and the PR
 discussion never met up.

 So, we have to decide what to do.

 Personally I think that overloading np.empty is horribly ugly, will
 continue confusing newbies and everyone else indefinitely, and I'm
 100% convinced that we'll regret implementing such a warty interface
 for something that should be so idiomatic. (Unfortunately I got busy
 and didn't actually say this in the previous thread though.) So I
 think we should just merge the PR as is. The only downside is the
 np.ma inconsistency, but, np.ma is already inconsistent (cf.
 masked_array.fill versus masked_array.filled!), somewhat deprecated,

somewhat deprecated?  Really?  Since when?  By whom?  Replaced by what?

 and AFAICT there are far more people who will benefit from a clean
 np.filled idiom than who actually use np.ma (and in particular its
 fill-value functionality). So there would be two

I think there are more np.ma users than you realize.  Everyone who uses 
matplotlib is using np.ma at least implicitly, if not explicitly.  Many 
of the matplotlib examples put np.ma to good use.  np.ma.filled is an 
essential long-standing part of the np.ma API.  I don't see any good 
rationale for generating a conflict with it, when an adequate 
non-conflicting alternative ('np.initialized', maybe others) exists.

Eric

 bad-but-IMHO-acceptable options: either live with an inconsistency
 between np.filled and np.ma.filled, or deprecate np.ma.filled in favor
 of masked_array.filled (which does exactly the same thing) and
 eventually switch np.ma.filled to be consistent with the new
 np.filled.

 But, that's just my opinion.

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.filled, again

2013-06-12 Thread Eric Firing

On 2013/06/12 4:18 AM, Nathaniel Smith wrote:
 Now imagine a different new version of this page, if we overload
 'empty' to add a fill= option. I don't even know how we document that
 on this page. The list will remain:
empty
ones
zeros

Opposite of empty: full.  So that is another non-conflicting 
alternative to my earlier suggestion of initialized.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.filled, again

2013-06-12 Thread Eric Firing

On 2013/06/12 8:13 AM, Warren Weckesser wrote:
 That's why I suggested 'filledwith' (add the underscore if you like).
 This also allows a corresponding masked implementation, 'ma.filledwith',
 without clobbering the existing 'ma.filled'.

Consensus on np.filled? absolutely not, you do not have a consensus.

np.filledwith or filled_with: fine with me, maybe even with 
everyone--let's see.  I would prefer the underscore version.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] empty_like for masked arrays

2013-06-10 Thread Eric Firing

On 2013/06/10 10:17 AM, Aldcroft, Thomas wrote:
 I use np.ma http://np.ma, and for me the most intuitive would be the
 second option where the new array matches the original array in shape
 and dtype, but always has an empty mask.  I always think of the *_like()
 functions as just copying shape and dtype, so it would be a bit
 surprising to get part of the data (the mask) from the original.  If you
 do need the mask then on the next line you have an explicit statement to
 copy the mask and the code and intent will be clear.  Also, most of the
 time the mask is set because that particular data value was bad or
 missing, so it seems like it would be a less-common use case to want a
 new empty array with the same mask.


I also use np.ma (and it is used internally in matplotlib).  I agree 
with Tom.  I think all of the *_like() functions should start with 
mask=False, meaning nothing is masked by default.  I don't see what the 
reasonable use cases would be for any alternative.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-04 Thread Eric Firing

On 2013/06/04 2:05 PM, Charles R Harris wrote:


 On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan
 jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote:

 Hi,

 I would like to suggest that the behavior of numpy.interp be changed
 regarding treatment of situations in which the x-coordinates are not
 monotonically increasing.  Specifically, it seems to me that interp
 should work correctly when the x-coordinate is decreasing
 monotonically.  Clearly it cannot work if the x-coordinate is not
 monotonic, but in that case it should raise an exception.  Currently
 if x is not increasing it simply silently fails, providing incorrect
 values.  This fix could be as simple as a monotonicity test and
 inversion if necessary (plus a raise statement for non-monotonic cases).


 Seems reasonable, although it might add a bit of execution time.

The monotonicity test should be an option if it is available at all; 
when interpolating a small number of points from a large pair of arrays, 
the single sweep through the whole array could dominate the execution 
time.  Checking for increasing versus decreasing, in contrast, can be 
done fast, so handling the decreasing case transparently is reasonable.

Eric


 Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-04 Thread Eric Firing

On 2013/06/04 4:15 PM, Benjamin Root wrote:
 Could non-monotonicity be detected as part of the interp process?
 Perhaps a sign switch in the deltas?

There are two code paths, depending on the number of points to be 
interpolated.  When it is greater than the size of the table, the deltas 
are pre-computed in a single sweep.  Non-monotonicity could be detected 
there at moderate cost.  In the other code path, for a smaller number of 
points, the deltas are computed only as needed, so monotonicity testing 
would require a separate sweep through the points.  That's the costly 
case that I think might reasonably be an option but that should not be 
required.

Eric


 I have been bitten by this problem too.

 Cheers!
 Ben Root

 On Jun 4, 2013 9:08 PM, Eric Firing efir...@hawaii.edu
 mailto:efir...@hawaii.edu wrote:
  
   On 2013/06/04 2:05 PM, Charles R Harris wrote:
   
   
On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan
jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu
 mailto:jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote:
   
Hi,
   
I would like to suggest that the behavior of numpy.interp be
 changed
regarding treatment of situations in which the x-coordinates
 are not
monotonically increasing.  Specifically, it seems to me that interp
should work correctly when the x-coordinate is decreasing
monotonically.  Clearly it cannot work if the x-coordinate is not
monotonic, but in that case it should raise an exception.
   Currently
if x is not increasing it simply silently fails, providing
 incorrect
values.  This fix could be as simple as a monotonicity test and
inversion if necessary (plus a raise statement for
 non-monotonic cases).
   
   
Seems reasonable, although it might add a bit of execution time.
  
   The monotonicity test should be an option if it is available at all;
   when interpolating a small number of points from a large pair of arrays,
   the single sweep through the whole array could dominate the execution
   time.  Checking for increasing versus decreasing, in contrast, can be
   done fast, so handling the decreasing case transparently is reasonable.
  
   Eric
  
   
Chuck
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] GSOC 2013

2013-03-06 Thread Eric Firing

On 2013/03/05 8:14 AM, Kurt Smith wrote:
 On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing efir...@hawaii.edu wrote:
 On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
 This made me think of a serious performance limitation of structured 
 dtypes: a
 structured dtype is always packed, which may lead to terrible byte 
 alignment
 for common types.  For instance, `dtype([('a', 'u1'), ('b',
 'u8')]).itemsize == 9`,
 meaning that the 8-byte integer is not aligned as an equivalent C-struct's
 would be, leading to all sorts of horrors at the cache and register level.

 Doesn't the align kwarg of np.dtype do what you want?

 In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']),
 align=True)

 In [3]: dt.itemsize
 Out[3]: 16

 Thanks!  That's what I get for not checking before posting.

 Consider this my vote to make `aligned=True` the default.

I strongly oppose this, because it would break the common usage of 
structured dtypes for reading packed binary data from files.  I see no 
reason to change the default.

Eric



 Eric
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] GSOC 2013

2013-03-04 Thread Eric Firing

On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
 This made me think of a serious performance limitation of structured 
 dtypes: a
 structured dtype is always packed, which may lead to terrible byte 
 alignment
 for common types.  For instance, `dtype([('a', 'u1'), ('b',
 'u8')]).itemsize == 9`,
 meaning that the 8-byte integer is not aligned as an equivalent C-struct's
 would be, leading to all sorts of horrors at the cache and register level.

Doesn't the align kwarg of np.dtype do what you want?

In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']), 
align=True)

In [3]: dt.itemsize
Out[3]: 16

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New numpy functions: filled, filled_like

2013-01-17 Thread Eric Firing

On 2013/01/17 4:13 AM, Pierre Haessig wrote:
 Hi,

 Le 14/01/2013 20:05, Benjamin Root a écrit :
 I do like the way you are thinking in terms of the broadcasting
 semantics, but I wonder if that is a bit awkward.  What I mean is, if
 one were to use broadcasting semantics for creating an array, wouldn't
 one have just simply used broadcasting anyway?  The point of
 broadcasting is to _avoid_ the creation of unneeded arrays.  But maybe
 I can be convinced with some examples.

 I feel that one of the point of the discussion is : although a new (or
 not so new...) function to create a filled array would be more elegant
 than the existing pair of functions np.zeros and np.ones, there are
 maybe not so many usecases for filled arrays *other than zeros values*.

 I can remember having initialized a non-zero array *some months ago*.
 For the anecdote it was a vector of discretized vehicule speed values
 which I wanted to be initialized with a predefined mean speed value
 prior to some optimization. In that usecase, I really didn't care about
 the performance of this initialization step.

 So my overall feeling after this thread is
   - *yes* a single dedicated fill/init/someverb function would give a
 slightly better API,
   -  but *no* it's not important because np.empty and np.zeros covers 95
 % usecases !

I agree with your summary and conclusion.

Eric


 best,
 Pierre



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New numpy functions: filled, filled_like

2013-01-14 Thread Eric Firing

On 2013/01/14 6:15 AM, Olivier Delalleau wrote:
 - I agree the name collision with np.ma.filled is a problem. I have no
 better suggestion though at this point.

How about initialized()?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New numpy functions: filled, filled_like

2013-01-13 Thread Eric Firing

On 2013/01/13 7:27 AM, Nathaniel Smith wrote:
 Hi all,

 PR 2875 adds two new functions, that generalize zeros(), ones(),
 zeros_like(), ones_like(), by simply taking an arbitrary fill value:
https://github.com/numpy/numpy/pull/2875
 So
np.ones((10, 10))
 is the same as
np.filled((10, 10), 1)

 The implementations are trivial, but the API seems useful because it
 provides an idiomatic way of efficiently creating an array full of
 inf, or nan, or None, whatever funny value you need. All the
 alternatives are either inefficient (np.ones(...) * np.inf) or
 cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But
 there's a question of taste here; one could argue instead that these
 just add more clutter to the numpy namespace. So, before we merge,
 anyone want to chime in?

I'm neutral to negative as to whether it is worth adding these to the 
namespace; I don't mind using the cumbersome alternative.

Note also that there is already a numpy.ma.filled() function for quite a 
different purpose, so putting a filled() in numpy breaks the pattern 
that ma has masked versions of most numpy functions.

This consideration actually tips me quite a bit toward the negative 
side.  I don't think I am unique in relying heavily on masked arrays.


 (Bonus, extra bike-sheddy survey: do people prefer
np.filled((10, 10), np.nan)
np.filled_like(my_arr, np.nan)

+1 for this form if you decide to do it despite the problem mentioned above.

 or
np.filled(np.nan, (10, 10))
np.filled_like(np.nan, my_arr)

This one is particularly bad for filled_like, therefore bad for both.

Eric

 ?)

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)

2012-09-21 Thread Eric Firing

On 2012/09/21 12:20 PM, Nathaniel Smith wrote:
 On Fri, Sep 21, 2012 at 10:04 PM, Chris Barker chris.bar...@noaa.gov wrote:
 On Fri, Sep 21, 2012 at 10:03 AM, Nathaniel Smith n...@pobox.com wrote:

 You're right of course. What I meant is that
a += b
 should produce the same result as
a[...] = a + b

 If we change the casting rule for the first one but not the second, though,
 then these will produce different results if a is integer and b is float:

 I certainly agree that we would want that, however, numpy still needs
 to deal tih pyton symantics, which means that wile (at the numpy
 level) we can control what a[...] = means, and we can control what
 a + b produces, we can't change what a + b means depending on the
 context of the left hand side.

 that means we need to do the casting at the assignment stage, which I
 gues is your point -- so:

 a_int += a_float

 should do the addition with the regular casting rules, then cast to
 an int after doing that.

 not sure the implimentation details.

 Yes, that seems to be what happens.

 In [1]: a = np.arange(3)

 In [2]: a *= 1.5

 In [3]: a
 Out[3]: array([0, 1, 3])

 But still, the question is, can and should we tighten up the
 assignment casting rules to same_kind or similar?

An example of where tighter casting seems undesirable is the case of 
functions that return integer values with floating point dtype, such as 
rint().  It seems natural to do something like

In [1]: ind = np.empty((3,), dtype=int)

In [2]: rint(np.arange(3, dtype=float) / 3, out=ind)
Out[2]: array([0, 0, 1])

where one is generating integer indices based on some manipulation of 
floating point numbers.  This works in 1.6 but fails in 1.7.

Eric

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.ma.MaskedArray.min() makes a copy?

2012-09-18 Thread Eric Firing

On 2012/09/18 7:40 AM, Benjamin Root wrote:


 On Fri, Sep 7, 2012 at 12:05 PM, Nathaniel Smith n...@pobox.com
 mailto:n...@pobox.com wrote:

 On 7 Sep 2012 14:38, Benjamin Root ben.r...@ou.edu
 mailto:ben.r...@ou.edu wrote:
  
   An issue just reported on the matplotlib-users list involved a
 user who ran out of memory while attempting to do an imshow() on a
 large array.  While this wouldn't be totally unexpected, the user's
 traceback shows that they ran out of memory before any actual
 building of the image occurred.  Memory usage sky-rocketed when
 imshow() attempted to determine the min and max of the image.  The
 input data was a masked array, and it appears that the
 implementation of min() for masked arrays goes something like this
 (paraphrasing here):
  
   obj.filled(inf).min()
  
   The idea is that any masked element is set to the largest
 possible value for their dtype in a copied array of itself, and then
 a min() is performed on that copied array.  I am assuming that max()
 does the same thing.
  
   Can this be done differently/more efficiently?  If the filled
 approach has to be done, maybe it would be a good idea to make the
 copy in chunks instead of all at once?  Ideally, it would be nice to
 avoid the copying altogether and utilize some of the special
 iterators that Mark Weibe created last year.

 I think what you're looking for is where= support for ufunc.reduce.
 This isn't implemented yet but at least it's straightforward in
 principle... otherwise I don't know anything better than
 reimplementing .min() by hand.

 -n



 Yes, it was the where= support that I was thinking of.  I take it that
 it was pulled out of the 1.7 branch with the rest of the NA stuff?

The where= support was left in:
http://docs.scipy.org/doc/numpy/reference/ufuncs.html

See also get_ufunc_arguments in ufunc_object.c.

Eric



 Ben Root



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)

2012-09-18 Thread Eric Firing

On 2012/09/18 9:25 AM, Charles R Harris wrote:


 On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root ben.r...@ou.edu
 mailto:ben.r...@ou.edu wrote:



 On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris
 charlesr.har...@gmail.com mailto:charlesr.har...@gmail.com wrote:



 On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root ben.r...@ou.edu
 mailto:ben.r...@ou.edu wrote:



 On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris
 charlesr.har...@gmail.com
 mailto:charlesr.har...@gmail.com wrote:



 On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant
 tra...@continuum.io mailto:tra...@continuum.io wrote:


 On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:

   Consider the following code:
  
   import numpy as np
   a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
   a *= float(255) / 15
  
   In v1.6.x, this yields:
   array([17, 34, 51, 68, 85], dtype=int16)
  
   But in master, this throws an exception about
 failing to cast via same_kind.
  
   Note that numpy was smart about this operation
 before, consider:
   a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
   a *= float(128) / 256

   yields:
   array([0, 1, 1, 2, 2], dtype=int16)
  
   Of course, this is different than if one does it
 in a non-in-place manner:
   np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
  
   which yields an array with floating point dtype
 in both versions.  I can appreciate the arguments
 for preventing this kind of implicit casting between
 non-same_kind dtypes, but I argue that because the
 operation is in-place, then I (as the programmer) am
 explicitly stating that I desire to utilize the
 current array to store the results of the operation,
 dtype and all.  Obviously, we can't completely turn
 off this rule (for example, an in-place addition
 between integer array and a datetime64 makes no
 sense), but surely there is some sort of happy
 medium that would allow these sort of operations to
 take place?
  
   Lastly, if it is determined that it is desirable
 to allow in-place operations to continue working
 like they have before, I would like to see such a
 fix in v1.7 because if it isn't in 1.7, then other
 libraries (such as matplotlib, where this issue was
 first found) would have to change their code anyway
 just to be compatible with numpy.

 I agree that in-place operations should allow
 different casting rules.  There are different
 opinions on this, of course, but generally this is
 how NumPy has worked in the past.

 We did decide to change the default casting rule to
 same_kind but making an exception for in-place
 seems reasonable.


 I think that in these cases same_kind will flag what are
 most likely programming errors and sloppy code. It is
 easy to be explicit and doing so will make the code more
 readable because it will be immediately obvious what the
 multiplicand is without the need to recall what the
 numpy casting rules are in this exceptional case. IISTR
 several mentions of this before (Gael?), and in some of
 those cases it turned out that bugs were being turned
 up. Catching bugs with minimal effort is a good thing.

 Chuck


 True, it is quite likely to be a programming error, but then
 again, there are many cases where it isn't.  Is the problem
 strictly that we are trying to downcast the float to an int,
 or is it that we are trying to downcast to a lower
 precision?  Is there a way for one to explicitly relax the
 same_kind restriction?


 I think the problem is down casting across kinds, with the
 result that floats are truncated and the imaginary parts of
 imaginaries might be discarded. That is, the value, not just the
 precision, of the

Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

2012-05-15 Thread Eric Firing

On 05/14/2012 06:03 PM, Travis Oliphant wrote:
 What happens, though when you have

 a[:, in1 :, in2]?

 in1 and in2 are broadcasted together to create a two-dimensional
 sub-space that must fit somewhere.   Where should it go?   Should
 it replace in1 or in2?I.e. should the output be

 (10,3,4,8) or (10,8,3,4).

 To resolve this ambiguity, the code sends the (3,4) sub-space to
 the front of the dimensions and returns (3,4,10,8).   In
 retro-spect, the code should raise an error as I doubt anyone
 actually relies on this behavior, and then we could have done the
 right thing for situations like in1 being an integer which actually
 makes some sense and should not have been confused with the general
 case

 In this particular case you might also think that we could say the
 result should be (10,3,8,4) but there is no guarantee that the number
 of dimensions that should be appended by the fancy-indexing objects
 will be the same as the number of dimensions replaced.Again, this
 is how fancy-indexing combines with other fancy-indexing objects.

 So, the behavior is actually quite predictable, it's just that in
 some common cases it doesn't do what you would expect --- especially
 if you think that [0,1] is the same as :2.   When I wrote this code
 to begin with I should have raised an error and then worked in the
 cases that make sense.This is a good example of making the
 mistake of thinking that it's better to provide something very
 general rather than just raise an error when an obvious and clear
 solution is not available.

 There is the possibility that we could now raise an error in NumPy
 when this situation is encountered because I strongly doubt anyone is
 actually relying on the current behavior.I would like to do this,
 actually, as soon as possible.  Comments?

Travis,

Good idea, especially if you can then make the integer case work as one 
might reasonably expect.  Keeping the present too-fancy capabilities can 
only cause continuing confusion.

Eric


 -Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

2012-04-17 Thread Eric Firing

On 04/17/2012 08:40 AM, Matthew Brett wrote:
 Hi,

 On Tue, Apr 17, 2012 at 7:24 AM, Nathaniel Smithn...@pobox.com  wrote:
 On Tue, Apr 17, 2012 at 5:59 AM, Matthew Brettmatthew.br...@gmail.com  
 wrote:
 Hi,

 On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphanttra...@continuum.io  
 wrote:
 Mark and I will have conversations about NumPy while he is in Austin.   
 There are many other active stake-holders whose opinions and views are 
 essential for major changes.Mark and I are working on other things 
 besides just NumPy and all NumPy changes will be discussed on list and 
 require consensus or super-majority for NumPy itself to change. I'm 
 not sure if that helps.   Is there more we can do?

 As you might have heard me say before, my concern is that it has not
 been easy to have good discussions on this list.   I think the problem
 has been that is has not been clear what the culture was, and how
 decisions got made, and that had led to some uncomfortable and
 unhelpful discussions.  My plea would be for you as BDF$N to strongly
 encourage on-list discussions and discourage off-list discussions as
 far as possible, and to help us make the difficult public effort to
 bash out the arguments to clarity and consensus.  I know that's a big
 ask.

 Hi Matthew,

 As you know, I agree with everything you just said :-). So in interest
 of transparency, I should add: I have been in touch with Travis some
 off-list, and the main topic has been how to proceed in a way that
 let's us achieve public consensus.

...when possible without paralysis.


 I'm glad to hear that discussion is happening, but please do have it
 on list.   If it's off list it easy for people to feel they are being
 bypassed, and that the public discussion is not important.  So, yes,
 you might get a better outcome for this specific case, but a worse
 outcome in the long term, because the list will start to feel that
 it's for signing off or voting rather than discussion, and that - I
 feel sure - would lead to worse decisions.

I think you are over-stating the case a bit.  Taking what you say 
literally, one might conclude that numpy people should never meet and 
chat, or phone each other up and chat.  But such small conversations are 
an important extension and facilitator of individual thinking. Major 
decisions do need to get hashed out publicly, but mailing list 
discussions are only one part of the thinking and decision process.

Eric


 The other issue is that there's a reason you are having the discussion
 off-list - which is that it was getting difficult on-list.  But -
 again - a personal view - that really has to be addressed directly by
 setting out the rules of engagement and modeling the kind of
 discussion we want to have.

 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-10 Thread Eric Firing

On 04/09/2012 06:52 PM, Travis Oliphant wrote:
 Hey all,

 I've been waiting for Mark Wiebe to arrive in Austin where he will
 spend several weeks, but I also know that masked arrays will be only
 one of the things he and I are hoping to make head-way on while he is
 in Austin.Nevertheless, we need to make progress on the masked
 array discussion and if we want to finalize the masked array
 implementation we will need to finish the design.

 I've caught up on most of the discussion including Mark's NEP,
 Nathaniel's NEP and other writings and the very-nice mailing list
 discussion that included a somewhat detailed discussion on the
 algebra of IGNORED.   I think there are some things still to be
 decided.  However, I think some things are pretty clear:

 1) Masked arrays are going to be fundamental in NumPy and these
 should replace most people's use of numpy.ma.   The numpy.ma code
 will remain as a compatibility layer

Excellent!  In mpl and other heavy users of numpy.ma there will still be 
work to do to handle all varieties of input, but it should be manageable.


 2) The reality of #1 and NumPy's general philosophy to date means
 that masked arrays in NumPy should support the common use-cases of
 masked arrays (including getting and setting of the mask from the
 Python and C-layers).  However, the semantic of what the mask implies
 may change from what numpy.ma uses to having  a True value meaning
 selected.

I never understood a strong argument for that change from numpy.ma. 
When editing data, it is natural to use flag bits to indicate various 
rejection criteria; no bit set means it's all good, so a False is 
naturally good and True is naturally mask it out.  But I can live 
with the change if you and Mark see a good reason for it.


 3) There will be missing-data dtypes in NumPy.   Likely
 only a limited sub-set (string, bytes, int64, int32, float32,
 float64, complex64, complex32, and object) with an API that allows
 more to be defined if desired.   These will most likely use Mark's
 nice machinery for managing the calculation structure without
 requiring new C-level loops to be defined.

So, these will be the bit-pattern versions of NA, correct?  With the bit 
pattern specified as an attribute of the dtype?  Good, but...

Are we getting into trouble here, figuring out how to handle all 
combinations of numpy.ma, masked dtypes, and Mark's masked NA?


 4) I'm still not sure about whether the IGNORED concept is necessary
 or not.I really like the separation that was emphasized between
 implementation (masks versus bit-patterns) and operations
 (propagating versus non-propagating).   Pauli even created another
 dimension which I don't totally grok and therefore can't remember.
 Pauli?  Do you still feel that is a necessary construction?  But, do
 we need the IGNORED concept to indicate what amounts to different
 default key-word arguments to functions that operate on NumPy arrays
 containing missing data (however that is represented)?My current
 weak view is that it is not really necessary.   But, I could be
 convinced otherwise.

I agree (if I understand you correctly); the goal is an expressive, 
explicit language that lets people accomplish what they want, clearly 
and quickly, and I think this is more a matter of practicality than 
purity of theory.  Nevertheless, achieving that is easier said than 
done, and figuring out how to handle corner cases is better done sooner 
than later.

Numpy.ma has never been perfect, but it has proven a good tool for 
practical work in my experience.  (Many thanks to Pierre GM for all his 
work on it.) One of the nice things it does is to automatically mask out 
invalid results.  This saves quit a bit of explicit checking that would 
otherwise be required.

Eric


 I think the good news is that given Mark's hard-work and Nathaniel's
 follow-up we are really quite far along.   I would love to get
 Nathaniel's opinion about what remains un-done in the current NumPy
 code-base.   I would also appreciate knowing (from anyone with an
 interest) opinions of items 1-4 above and anything else I've left
 out.

 Thanks,

 -Travis
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using logical function on more than 2 arrays, availability of a between function ?

2012-03-25 Thread Eric Firing

On 03/25/2012 06:55 AM, Pierre Haessig wrote:
 Hi,

 I have an off topic but somehow related question :

 Le 19/03/2012 12:04, Matthieu Rigal a écrit :
 array = numpy.logical_and(numpy.logical_and(aBlueChannel  1.0, aNirChannel
 (aBlueChannel * 1.0)), aNirChannel  (aBlueChannel * 1.8))
 Is there any significant difference between :

 z = np.logical_and(x,y) and
 z= x  y (assuming x and y are already numpy arrays and not just list)

 I've always used the  (and | and ~) operator because it's of course
 much shorter ;-)

 I've seen no mention of the  operator in np.logical_and docstring so
 I wonder...

There is a big difference: , |, and ~ are bitwise operators, not 
logical operators, so they work like logical operators only if operating 
on booleans (or at least arrays containing nothing but integer zeros and 
ones) and only if you bear in mind that  and | have lower precedence 
than their logical counterparts.  Therefore you often need to use more 
parentheses than you might have expected.

In [1]: a = np.array([1])

In [2]: b = np.array([2])

In [5]: np.logical_and(a,b)
Out[5]: array([ True], dtype=bool)

In [6]: a  b
Out[6]: array([0])


Using the bitwise operators in place of logical operators is a hack to 
get around limitations of the language; but, if done carefully, it is a 
useful one.

Eric


 Best,
 Pierre




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using logical function on more than 2 arrays, availability of a between function ?

2012-03-25 Thread Eric Firing

On 03/25/2012 12:22 PM, Pierre Haessig wrote:
 Hi Eric,

 Thanks for the hints !

 Le 25/03/2012 20:33, Eric Firing a écrit :
 Using the bitwise operators in place of logical operators is a hack to
 get around limitations of the language; but, if done carefully, it is a
 useful one.
 What is the rationale behind not overloading __and__  other logical
 operations ?
 Is it a requirement that boolean operators should always return *a bool*
 and not an *array of bools* ?

Pierre,

See http://www.python.org/dev/peps/pep-0335/

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Eric Firing

On 03/07/2012 09:26 AM, Nathaniel Smith wrote:
 On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
 charlesr.har...@gmail.com  wrote:
 On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessigpierre.haes...@crans.org
 Coming back to Travis proposition bit-pattern approaches to missing
 data (*at least* for float64 and int32) need to be implemented., I
 wonder what is the amount of extra work to go from nafloat64 to
 nafloat32/16 ? Is there an hardware support NaN payloads with these
 smaller floats ? If not, or if it is too complicated, I feel it is
 acceptable to say it's too complicated and fall back to mask. One may
 have to choose between fancy types and fancy NAs...

 I'm in agreement here, and that was a major consideration in making a
 'masked' implementation first.

 When it comes to missing data, bitpatterns can do everything that
 masks can do, are no more complicated to implement, and have better
 performance characteristics.

 Also, different folks adopt different values
 for 'missing' data, and distributing one or several masks along with the
 data is another common practice.

 True, but not really relevant to the current debate, because you have
 to handle such issues as part of your general data import workflow
 anyway, and none of these is any more complicated no matter which
 implementations are available.

 One inconvenience I have run into with the current API is that is should be
 easier to clear the mask from an ignored value without taking a new view
 or assigning known data. So maybe two types of masks (different payloads),
 or an additional flag could be helpful. The process of assigning masks could
 also be made a bit easier than using fancy indexing.

 So this, uh... this was actually the whole goal of the alterNEP
 design for masks -- making all this stuff easy for people (like you,
 apparently?) that want support for ignored values, separately from
 missing data, and want a nice clean API for it. Basically having a
 separate .mask attribute which was an ordinary, assignable array
 broadcastable to the attached array's shape. Nobody seemed interested
 in talking about it much then but maybe there's interest now?

In other words, good low-level support for numpy.ma functionality?  With 
a migration path so that a separate numpy.ma might wither away?  Yes, 
there is interest; this is exactly what I think is needed for my own 
style of applications (which I think are common at least in geoscience), 
and for matplotlib.  The question is how to achieve it as simply and 
cleanly as possible while also satisfying the needs of the R users, and 
while making it easy for matplotlib, for example, to handle *any* 
reasonable input: ma, other masking, nan, or NA-bitpattern.

It may be that a rather pragmatic approach to implementation will prove 
better than a highly idealized set of data models.  Or, it may be that a 
dual approach is best, in which the flag value missing data 
implementation is tightly bound to the R model and the mask 
implementation is explicitly designed for the numpy.ma model. In any 
case, a reasonable level of agreement on the goals is needed.  I presume 
Travis's involvement will facilitate a clarification of the goals and of 
the implementation; and I expect that much of Mark's work will end up 
serving well, even if much needs to be added and the API evolves 
considerably.

Eric


 -- Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Eric Firing

On 03/07/2012 11:15 AM, Pierre Haessig wrote:
 Hi,
 Le 07/03/2012 20:57, Eric Firing a écrit :
 In other words, good low-level support for numpy.ma functionality?
 Coming back to *existing* ma support, I was just wondering whether it
 was now possible to np.save a masked array.
 (I'm using numpy 1.5)

No, not with the mask preserved.  This is one of the improvements I am 
hoping for with the upcoming missing data work.

Eric

 In the end, this is the most annoying problem I have with the existing
 ma module which otherwise is pretty useful to me. I'm happy not to need
 to process 100% of my data though.

 Best,
 Pierre




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Eric Firing

On 02/17/2012 09:55 PM, David Cournapeau wrote:
 I may not have explained it very well: my whole point is that we don't
 recruite people, where I understand recruit as hiring full time,
 profesional programmers.We need more people who can casually spend a few
 hours - typically grad students, scientists with an itch. There is no
 doubt that more professional programmers know c++ compared to C. But a
 community project like numpy has different requirements than a
 professional project.


My sense from the thread so far is that the C++ push is part of the new 
vision, in which numpy will make the transition to a more professional 
level, with paid developers, and there will no longer be the expectation 
that grad students, scientists with an itch will dive into the 
innermost guts of the code.  The guts will be more like Qt or AGG or 
0MQ--solid, documented libraries that just work (I think--I don't really 
know that much about these examples), so we can take them for granted 
and worry about other things instead.   If that can be accomplished, it 
is certainly more than fine with me; and if the best way to accomplish 
that is with C++, so be it.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] change the mask state of one element in a masked array

2012-02-18 Thread Eric Firing

On 02/18/2012 05:52 AM, Chao YUE wrote:
 Dear all,

 I built a new empty masked array:

 In [91]: a=np.ma.empty((2,5))

Of course this only makes sense if you are going to immediately populate 
the array.


 In [92]: a
 Out[92]:
 masked_array(data =
   [[  1.20569155e-312   3.34730819e-316   1.13580079e-316   1.11459945e-316
  9.69610549e-317]
   [  6.94900258e-310   8.48292532e-317   6.94900258e-310   9.76397825e-317
  6.94900258e-310]],
   mask =
   False,
 fill_value = 1e+20)


 as you see, the mask for all the elements are false. so how can I set
 for some elements to masked elements (mask state as true)?
 let's say, I want a[0,0] to be masked.

a[0,0] = np.ma.masked

Eric


 thanks  cheers,

 Chao

 --
 ***
 Chao YUE
 Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
 UMR 1572 CEA-CNRS-UVSQ
 Batiment 712 - Pe 119
 91191 GIF Sur YVETTE Cedex
 Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
 



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread Eric Firing

On 02/17/2012 05:39 AM, Charles R Harris wrote:


 On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com
 mailto:courn...@gmail.com wrote:

 Hi Travis,

 On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
 tra...@continuum.io mailto:tra...@continuum.io wrote:
   Mark Wiebe and I have been discussing off and on (as well as
 talking with Charles) a good way forward to balance two competing
 desires:
  
  * addition of new features that are needed in NumPy
  * improving the code-base generally and moving towards a
 more maintainable NumPy
  
   I know there are load voices for just focusing on the second of
 these and avoiding the first until we have finished that.  I
 recognize the need to improve the code base, but I will also be
 pushing for improvements to the feature-set and user experience in
 the process.
  
   As a result, I am proposing a rough outline for releases over the
 next year:
  
  * NumPy 1.7 to come out as soon as the serious bugs can be
 eliminated.  Bryan, Francesc, Mark, and I are able to help triage
 some of those.
  
  * NumPy 1.8 to come out in July which will have as many
 ABI-compatible feature enhancements as we can add while improving
 test coverage and code cleanup.   I will post to this list more
 details of what we plan to address with it later.Included for
 possible inclusion are:
  * resolving the NA/missing-data issues
  * finishing group-by
  * incorporating the start of label arrays
  * incorporating a meta-object
  * a few new dtypes (variable-length string,
 varialbe-length unicode and an enum type)
  * adding ufunc support for flexible dtypes and possibly
 structured arrays
  * allowing generalized ufuncs to work on more kinds of
 arrays besides just contiguous
  * improving the ability for NumPy to receive JIT-generated
 function pointers for ufuncs and other calculation opportunities
  * adding filters to Input and Output
  * simple computed fields for dtypes
  * accepting a Data-Type specification as a class or JSON file
  * work towards improving the dtype-addition mechanism
  * re-factoring of code so that it can compile with a C++
 compiler and be minimally dependent on Python data-structures.

 This is a pretty exciting list of features. What is the rationale for
 code being compiled as C++ ? IMO, it will be difficult to do so
 without preventing useful C constructs, and without removing some of
 the existing features (like our use of C99 complex). The subset that
 is both C and C++ compatible is quite constraining.


 I'm in favor of this myself, C++ would allow a lot code cleanup and make
 it easier to provide an extensible base, I think it would be a natural
 fit with numpy. Of course, some C++ projects become tangled messes of
 inheritance, but I'd be very interested in seeing what a good C++
 designer like Mark, intimately familiar with the numpy code base, could
 do. This opportunity might not come by again anytime soon and I think we
 should grab onto it. The initial step would be a release whose code that
 would compile in both C/C++, which mostly comes down to removing C++
 keywords like 'new'.

 I did suggest running it by you for build issues, so please raise any
 you can think of. Note that MatPlotLib is in C++, so I don't think the
 problems are insurmountable. And choosing a set of compilers to support
 is something that will need to be done.

It's true that matplotlib relies heavily on C++, both via the Agg 
library and in its own extension code.  Personally, I don't like this; I 
think it raises the barrier to contributing.  C++ is an order of 
magnitude more complicated than C--harder to read, and much harder to 
write, unless one is a true expert. In mpl it brings reliance on the CXX 
library, which Mike D. has had to help maintain.  And if it does 
increase compiler specificity, that's bad.

I would much rather see development in the direction of sticking with C 
where direct low-level control and speed are needed, and using cython to 
gain higher level language benefits where appropriate.  Of course, that 
brings in the danger of reliance on another complex tool, cython.  If 
that danger is considered excessive, then just stick with C.

Eric


 Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy governance update

2012-02-15 Thread Eric Firing

On 02/15/2012 08:50 AM, Matthew Brett wrote:
 Hi,

 On Wed, Feb 15, 2012 at 5:51 AM, Alan G Isaacalan.is...@gmail.com  wrote:
 On 2/14/2012 10:07 PM, Bruce Southey wrote:
 The one thing that gets over looked here is that there is a huge
 diversity of users with very different skill levels. But very few
 people have an understanding of the core code. (In fact the other
 thread about type-casting suggests that it is extremely few people.)
 So in all of this, I do not yet see 'community'.


 As an active user and long-time list member
 who has never even looked at the core code,
 I perhaps presumptuously urge a moderation
 of rhetoric. I object to the idea that users
 like myself do not form part of the community.

 This list has 1400 subscribers, and the fact that
 most of us are quiet most of the time does not mean we
 are not interested or attentive to the discussions,
 including discussions of governance.

 It looks to me like this will be great for NumPy.
 People who would otherwise not be able to spend much
 time on NumPy will be spending a lot of time improving
 the code and adding features. In my view, this will help
 NumPy advance which will enlarge the user community, which will
 slowly but inevitably enlarge the contributor community.
 I'm pretty excited about Travis's bold efforts to find
 ways to allow him and others to spend more time on NumPy.
 I wish him the best of luck.

 I think it is important to stick to the thread topic here, which is
 'Governance'.

Do you have in mind a model of how this might work?  (I suspect you have 
already answered a question like that in some earlier thread; sorry.)  A 
comparable project that is doing it right?

Governance implies enforcement power, doesn't it?  Where, how, and by 
whom would the power be exercised?

 It's not about whether it is good or bad that Travis has re-engaged in
 Numpy and is funding development in Numpy through his company.   I'm
 personally very glad to see Travis back on the list and engaged again,
 but that's really not what the thread is about.

 The thread is about whether we need explicit Numpy governance,
 especially in the situation where one new company will surely dominate
 numpy development in the short term at least.

 I would say - for the benefit of Continuum Analytics and for the Numpy
 community, there should be explicit governance, that takes this
 relationship into account.

Please elaborate; are you saying that Continuum Analytics must develop 
numpy as decided by some outside body?

Eric


 I believe that leaving the governance informal and underspecified at
 this stage would be a grave mistake, for everyone concerned.

 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?

2012-02-14 Thread Eric Firing

On 02/13/2012 08:07 PM, Charles R Harris wrote:



 Let it go, Travis. It's a waste of time.

(Off-list) Chuck, I really appreciate your consistent good sense; this 
is just one of many examples.  Thank you for all your numpy work.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Migrating issues to GitHub

2012-02-11 Thread Eric Firing

On 02/11/2012 10:44 AM, Travis Oliphant wrote:
 This is good feedback.

 It looks like there are 2 concerns:

 1) no way to add attachments --- it would seem that gists and indeed
 other github repos solves that problem.

Not really, in practice.  Yes one can use these mechanisms, but they are 
much clunkier and more obscure than simply being able to attach files 
via an immediate interface.  So the barrier to actual use is high.

 2) You must be an admin to label an issue (i.e. set it as a bug,
 enhancement, or so forth).

A third problem is that the entire style of presentation is poorly 
designed from a use standpoint, in comparison to the sourceforge tracker 
which mpl used previously.  The github tracker appears to have been 
designed by a graphics person, not a software maintainer.  The 
information density in the issue list is very low; it is impossible to 
scan a large number of issues at once; there doesn't seem to be any 
useful sorting and selection mechanism.

 This second concern seems more of a problem. Perhaps this is something
 that can be brought up with the github developers directly. Not
 separating issue permissions from code permissions seems rather
 unfortunate, and creates work for all admins.

This doesn't seem so bad to me, at least compared to the *really* bad 
aspects.


 On the other hand, it might force having an admin who is paying regular
 attention to the issues which is not necessarily a bad thing.

 So, despite the drawback, it seems that having issues on Trac and having
 code-conversations on those issues happening separately from the
 pull-request conversations is even less optimal.

The one good thing about the github tracker is its integration with the 
code.  Otherwise it is still just plain bad, and will remain so until it 
is given an information-dense tabular interface, with things like 
initiation date, last update, category, priority, etc.  Down with 
whitespace and icons! We need information!

Eric

 -Travis



 On Feb 11, 2012, at 2:06 PM, Benjamin Root wrote:



 On Saturday, February 11, 2012, Travis Oliphant tra...@continuum.io
 mailto:tra...@continuum.io wrote:
  How to people feel about moving the issue tracking for NumPy to
 Github? It looks like they have improved their issue tracking quite a
 bit and the workflow and integration with commits looks quite good
 from what I can see.
  Here is one tool I saw that might help in the migration:
 https://github.com/trustmaster/trac2github
  Are there others?
  -Travis
 

 This is probably less of an issue for numpy, but our biggest complaint
 about the github tracker for matplotlib is the inability for users to
 add attachments.

 The second complaint is that it is awkward to assign priorities (has
 to be done via labels). Particularly, users can not apply labels
 themselves.

 Mind you, neither of these complaints were enough to completely
 preclude mpl from migrating, but it should be taken into consideration.

 Cheers!
 Ben Root ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.arange() error?

2012-02-09 Thread Eric Firing

On 02/09/2012 09:20 AM, Drew Frank wrote:
Eric Firingefiringat hawaii.edu writes:

On 02/08/2012 09:31 PM, teomat wrote:

Hi,

Am I wrong or the numpy.arange() function is not correct 100%?

Try to do this:

In [7]: len(np.arange(3.1, 4.9, 0.1))
Out[7]: 18

In [8]: len(np.arange(8.1, 9.9, 0.1))
Out[8]: 19

I would expect the same result for each command.

Not after more experience with the wonders of floating point!
Nice-looking decimal numbers often have long, drawn-out, inexact
floating point (base 2) representations. That leads to exactly this
sort of problem.

numpy.linspace is provided to help get around some of these surprises;
or you can use an integer sequence and then scale and shift it.

Eric

All the best

I also found this surprising -- not because I lack experience with floating
point, but because I do have experience with MATLAB. In MATLAB, the
corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit
tolerance parameter used in the implmentation
(http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96).

Of course, NumPy is not MATLAB :). That said, I prefer the MATLAB behavior in
this case -- even though it has a bit of a magic feel to it, I find it hard
to
imagine code that operates correctly given the Python semantics and
incorrectly
under MATLAB's. Thoughts?

You raise a good point. Neither arange nor linspace provides a close
equivalent to the nice behavior of the Matlab colon, even though that is
often what one really wants. Adding this, either via an arange kwarg, a
linspace kwarg, or a new function, seems like a good idea.

Eric

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.arange() error?

2012-02-08 Thread Eric Firing

On 02/08/2012 09:31 PM, teomat wrote:

 Hi,

 Am I wrong or the numpy.arange() function is not correct 100%?

 Try to do this:

 In [7]: len(np.arange(3.1, 4.9, 0.1))
 Out[7]: 18

 In [8]: len(np.arange(8.1, 9.9, 0.1))
 Out[8]: 19

 I would expect the same result for each command.

Not after more experience with the wonders of floating point! 
Nice-looking decimal numbers often have long, drawn-out, inexact 
floating point (base 2) representations.  That leads to exactly this 
sort of problem.

numpy.linspace is provided to help get around some of these surprises; 
or you can use an integer sequence and then scale and shift it.

Eric


 All the best



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)

2011-10-29 Thread Eric Firing

On 10/29/2011 12:26 AM, Ralf Gommers wrote:
 The history of this discussion doesn't suggest it straightforward to get
 a design right first time. It's a complex subject.

 The second part of your statement, and then implement, sounds so
 simple. The reality is that there are only a handful of developers who
 have done a significant amount of work on the numpy core in the last two
 years. I haven't seen anyone saying they are planning to implement (part
 of) whatever design the outcome of this discussion will be. I don't
 think it's strange to keep this in mind to some extent.

...including the fact that last summer, Mark had a brief one-time 
opportunity to contribute major NA code.  I expect that even if some 
modifications are made to what he contributed, letting him get on with 
it will turn out to have been the right move.

Apparently Travis hopes to put in a burst of coding in 2012:

http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-pypy.html

Go to the section NumPy will be evolving rapidly over the coming 
years.  Note that missing data bit-patterns is on his list, 
consistent with his most recent messages.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)

2011-10-29 Thread Eric Firing

On 10/29/2011 12:02 PM, Olivier Delalleau wrote:


 I haven't been following the discussion closely, but wouldn't it be instead:
 a.mask[0:2] = True?

That would be consistent with numpy.ma and the opposite of Mark's 
implementation.

I can live with either, but I much prefer the numpy.ma version because 
it fits with the use of bit-flags for editing data; set bit 1 if it 
fails check A, set bit 2 if it fails check B, etc.  So, if it evaluates 
as True, there is a problem, and the value is masked *out*.

Similarly, in Marks implementation, 7 bits are available for a payload 
to describe what kind of masking is meant.  This seems more consistent 
with True as masked (or NA) than with False as masked.

Eric


 It's something that I actually find a bit difficult to get right in the
 current numpy.ma http://numpy.ma implementation: I would find more
 intuitive to have True for valid data, and False for invalid / missing
 / ... I realize how the implementation makes sense (and is appropriate
 given that the name is mask), but I just thought I'd point this out...
 even if it's just me ;)

 -=- Olivier



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)

2011-10-29 Thread Eric Firing

On 10/29/2011 12:57 PM, Charles R Harris wrote:


 On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing efir...@hawaii.edu
 mailto:efir...@hawaii.edu wrote:

 On 10/29/2011 12:02 PM, Olivier Delalleau wrote:

  
   I haven't been following the discussion closely, but wouldn't it
 be instead:
   a.mask[0:2] = True?

 That would be consistent with numpy.ma http://numpy.ma and the
 opposite of Mark's
 implementation.

 I can live with either, but I much prefer the numpy.ma
 http://numpy.ma version because
 it fits with the use of bit-flags for editing data; set bit 1 if it
 fails check A, set bit 2 if it fails check B, etc.  So, if it evaluates
 as True, there is a problem, and the value is masked *out*.

 Similarly, in Marks implementation, 7 bits are available for a payload
 to describe what kind of masking is meant.  This seems more consistent
 with True as masked (or NA) than with False as masked.


 I wouldn't rely on the 7 bits yet. Mark left them available to keep open
 possible future use, but didn't implement anything using them yet. If
 memory use turns out to exclude whole sectors of application we will
 have to go to bit masks.

Right; I was only commenting on a subjective sense of internal 
consistency.  A minor point.

The larger context of all this is how users end up being able to work 
with all the different types and specifications of NA (in the most 
general sense) data:

1) nans
2) numpy.ma
3) masks in the core (Mark's new code)
4) bit patterns

Substantial code now in place--including matplotlib--relies on numpy.ma. 
  It has some rough edges, it can be slow, it is a pain having it as a 
bolted-on module, it may be more complicated than it needs to be, but it 
fits a lot of use cases pretty well.  There are many users.  Everyone 
using matplotlib is using it, whether they know it or not.

The ideal from my numpy.ma-user's standpoint would an NA-handling 
implementation in the core that would do two things:
(1) allow a gradual transition away from numpy.ma, so that the latter 
would become redundant.
(2) allow numpy.ma to be reasonably easily modified to use the in-core 
facilities for greater efficiency during the long transition.  Implicit 
is the hope that someone (most likely not me, although I might be able 
to help a bit) would actually perform this modification.

Mark's mission, paid for by Enthought, was not to please numpy.ma users, 
but to add NA-handling that would be comfortable for R-users.  He chose 
to do so with the idea that two possible implementations (masks and 
bitpatterns) were desirable, each with strengths and weaknesses, and 
that so as to get *something* done in the very short time he had left, 
he would start with the mask implementation.  We now have the result, 
incomplete, but not breaking anything.  Additional development (coding 
as well as designing) will be needed.

The main question raised by Matthew and Nathaniel is, I think, whether 
Mark's code should develop in a direction away from the R-compatibility 
model, with the idea that the latter would be handled via a bit-pattern 
implementation, some day, when someone codes it; or whether it should 
remain as the prototype and first implementation of an API to handle the 
R-compatible use case, minimizing any divergence from any eventual 
bit-pattern implementation.

The answer to this depends on several questions, including:

1) Who is available to do how much implementation of any of the 
possibilities?  My reading of Travis's blog and rare posts to this list 
suggest that he hopes and expects to be able to free up coding time. 
Perhaps he will clarify that soon.

2) What sorts of changes would actually be needed to make the present 
implementation good enough for the R use case?  Evolutionary, or 
revolutionary?

3) What sorts of changes would help with the numpy.ma use case? 
Evolutionary, or revolutionary.

4) Given available resources, how can we maximize progress: making numpy 
more capable, easier to use, etc.

Unless the answers to questions 2 *and* 3 are revolutionary, I don't 
see the point in pulling Mark's changes out of master.  At most, the 
documentation might be changed to mark the NA API as experimental for 
a release or two.

Overall, I think that the differences between the R use case and the ma 
use case have been overstated and over-emphasized.

Eric



 Chuck



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Eric Firing

On 10/25/2011 04:56 PM, Travis Oliphant wrote:
 So, I am very interested in making sure I remember the details of the
 counterproposal.What I recall is that you wanted to be able to
 differentiate between a bit-pattern mask and a boolean-array mask
 in the API.   I believe currently even when bit-pattern masks are
 implemented the difference will be hidden from the user on the
 Python level.

 I am sure to be missing other parts of the discussion as I have been
 in and out of it.

 Thanks,

 -Travis

The alternative-NEP is here: https://gist.github.com/1056379/

One thread of discussion is here:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg32268.html

and continued here:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg32371.html

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Eric Firing

On 10/23/2011 10:49 AM, Nathaniel Smith wrote:
 But I (and presumably others) were unaware of the pull request,
 because it turns out that actually Mark did*not*  point to the pull
 request, at least in email to either me or numpy-discussion. As far as
 I can tell, the first time that pull request has ever been mentioned
 on the list is in Pauli's email today. (I did worry I might have
 missed it, so I just double-checked the archives for August 18-August
 27, which is the time period the pull request was open, and couldn't
 find anything there.)

Ideally, Mark's message announcing that his branch was ready for testing 
(a message that started a thread of constructive comment) would have 
mentioned the pull request:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg33151.html

Ultimately, though, the numpy core developers must decide what goes in 
and what does not.  Consensus is desirable but may not always be 
possible or optimal, especially if consensus is interpreted as 
unanimity.  There is a risk in deciding to accept a major change, but 
it is mitigated by the ability to make future changes, and it is a risk 
that must be taken if progress is to be made.  As a numpy user, I was 
pleased to see Travis make the decision that Mark should get on with the 
coding, and I was pleased to see Charles make the decision to merge the 
pull request.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Eric Firing

On 10/23/2011 12:34 PM, Nathaniel Smith wrote:

 like. And in this case I do think we can come up with an API that will
 make everyone happy, but that Mark's current API probably can't be
 incrementally evolved to become that API.)


No one could object to coming up with an API that makes everyone happy, 
provided that it actually gets coded up, tested, and is found to be fast 
and maintainable.  When you say the API probably can't be evolved, do 
you mean that the underlying implementation also has to be redone?  And 
if so, who will do it, and when?

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] wanted: decent matplotlib alternative

2011-10-13 Thread Eric Firing

On 10/13/2011 12:22 PM, Gökhan Sever wrote:


 On Thu, Oct 13, 2011 at 4:15 PM, Benjamin Root ben.r...@ou.edu
 mailto:ben.r...@ou.edu wrote:

 Myself and other developers would greatly appreciate help from the
 community to point out which examples are too confusing or out of
 date. We


 It would be nice to have a social interface for the mpl gallery like the
 one similar to the R-gallery
 [http://www.r-bloggers.com/the-r-graph-gallery-goes-social/]

I think that the priority should go towards massive pruning, 
organization, and cleanup of the gallery.  This would be a great project 
for a new contributor to mpl.

Eric



 --
 Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.interp running time

2011-08-16 Thread Eric Firing

On 08/16/2011 04:22 AM, Timo Kluck wrote:
 2011/8/1 Timo Klucktkl...@infty.nl:
 I just submitted a patch at
 http://projects.scipy.org/numpy/ticket/1920 . It implements Eric's
 suggestion. Please review, I'll be happy to adapt it to any of your
 feedback.

 I submitted a minor patch a while ago. It hasn't been reviewed yet,
 but I don't know whether that's just because the reviewers just
 haven't had time yet, or whether some extra action is required on my
 part. Perhaps the ticket should be 'tagged' for review, or similar?
 Let me know if there's anything more that I should do.

 Timo

Timo,

I suspect the one thing that would improve the likelihood of review 
would be if you were to supply the patch via a github pull request.  In 
addition, posting a timing test (code and results) might help.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Eric Firing

On 08/03/2011 11:24 AM, Gökhan Sever wrote:

 I[1]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
 1 loops, best of 3: 263 ms per loop

You need to clear your cache and then run timeit with options -n1 -r1.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.interp running time

2011-07-30 Thread Eric Firing

On 07/29/2011 11:18 AM, Timo Kluck wrote:
 Dear numpy developers,

 The current implementation of numpy.interp(x,xp,fp) comes down to: first
 calculating all the slopes of the linear interpolant (these are
 len(xp)-1), then use a binary search to find where x is in xp (running
 time log(len(xp)). So we obtain a running time of

 O( len(xp) + len(x)*log(len(xp) )

 We could improve this to just

 O( len(x)*log(len(xp) )

 by not caching the slopes. The point is, of course, that this is
 slightly slower in the common use case where x is is refinement of xp,
 and where you will have to compute all the slopes anyway.

 In my personal use case, however, I needed the value of the
 interp(x0,xp,fp) in order to calculate the next point x1 where I wanted
 to calculate interp(x1,xp,fp). The current implementation gave a severe
 running time penalty.

Maybe the thing to do is to pre-calculate if len(xp) = len(x), or some 
such guess as to which method would be more efficient.

Eric


 I have looked at the source and I could easily produce a patch for this.
 Would you be interested in it?

 Cheers,
 Timo Kluck



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] code review request: masked dtype transfers

2011-07-09 Thread Eric Firing

On 07/08/2011 01:31 PM, Mark Wiebe wrote:
 I've just made pull request 105:

 https://github.com/numpy/numpy/pull/105


It's merged, which is good, but I have a suggestion relevant to that 
pull and I suspect to many others to come: use defines and macros to 
consolidate some of the implementation details.  For example:

#define MASK_TYPE npy_uint8
#define EXPOSE 1
#define HIDE 0
#define EXPOSED(mask) ( (*(MASK_TYPE *)mask)0x01 == EXPOSE )

etc.

The potential advantages are readability, reduction of scope for typos, 
and ease of testing alternative implementation details, should that turn 
out to be desirable.  I am assuming that only a few expressions like 
EXPOSED will be needed *many* places in the code.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] code review request: masked dtype transfers

2011-07-08 Thread Eric Firing

On 07/08/2011 01:31 PM, Mark Wiebe wrote:
 I've just made pull request 105:

 https://github.com/numpy/numpy/pull/105

 This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto,
 which behave analogously to the corresponding unmasked functions. To
 expose this with a reasonable interface, I added a function np.copyto,
 which takes a 'where=' parameter just like the element-wise ufuncs.

 One thing which needs discussion is that I've flagged 'putmask' and
 PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto
 handle what those functions do but in a more flexible fashion. If there
 are any objections to deprecating 'putmask' and PyArray_PutMask, please
 speak up!

 Thanks,
 Mark

Mark,

I thought I would do a test comparison of putmask and copyto, so I 
fetched and checked out your branch and tried to build it (after 
deleting my build directory), but the build failed:

numpy/core/src/multiarray/multiarraymodule_onefile.c:41:20: fatal error: 
nditer.c: No such file or directory
compilation terminated.
error: Command gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv 
-O2 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include 
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy 
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core 
-Inumpy/core/src/npymath -Inumpy/core/src/multiarray 
-Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include 
-I/usr/include/python2.7 
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray 
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c 
numpy/core/src/multiarray/multiarraymodule_onefile.c -o 
build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o
 
failed with exit status 1

Indeed, with rgrep I see:
./numpy/core/src/multiarray/multiarraymodule_onefile.c:#include nditer.c

but no sign of nditer.c in the directory tree.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] code review request: masked dtype transfers

2011-07-08 Thread Eric Firing

On 07/08/2011 01:31 PM, Mark Wiebe wrote:
 I've just made pull request 105:

 https://github.com/numpy/numpy/pull/105

 This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto,
 which behave analogously to the corresponding unmasked functions. To
 expose this with a reasonable interface, I added a function np.copyto,
 which takes a 'where=' parameter just like the element-wise ufuncs.

 One thing which needs discussion is that I've flagged 'putmask' and
 PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto
 handle what those functions do but in a more flexible fashion. If there
 are any objections to deprecating 'putmask' and PyArray_PutMask, please
 speak up!

Mark,

Looks good!  Some quick tests with large and small arrays show copyto is 
faster than putmask when the source is an array and only a bit slower 
when the source is a scalar.

Eric


 Thanks,
 Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] using the same vocabulary for missing value ideas

2011-07-07 Thread Eric Firing

On 07/06/2011 07:51 PM, Chris Barker wrote:
 On 7/6/11 11:57 AM, Mark Wiebe wrote:
 On Wed, Jul 6, 2011 at 1:25 PM, Christopher Barker

  Is this really true? if you use a bitpattern for IGNORE, haven't you
  just lost the ability to get the original value back if you want to stop
  ignoring it? Maybe that's not inherent to what an IGNORE means, but it
  seems pretty key to me.

 What do you think of renaming IGNORE to SKIP?

 This isn't a semantics issue -- IGNORE is fine.

 What I'm getting at is that we need a word (and code) for:

 ignore for now, but I might want to use it later

HIDE?  That implies there is still something there, potentially recoverable.

Eric


 - Chris





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing/accumulating data

2011-07-01 Thread Eric Firing

On 07/01/2011 10:27 AM, Charles R Harris wrote:


 On Fri, Jul 1, 2011 at 1:39 PM, Christopher Barker
 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov wrote:

 Joe Harrington wrote:
All
   that has to happen is to allow the sense of the mask to be FALSE
 = the
   data are bad, TRUE = the data are good, and allow (not require) the
   mask to be of any numerical type, or at least of integer type as well
   as boolean.

 quick note on this: I like the FALSE == good way, because:

 instead of good and bad we think masked and unmasked, then we have:

 False = unmasked = regular old data
 True = masked = something special about the data

 The default for something special is bad (or missing , or
 ignore), but the cool thing is that if you use an int:

 0 = unmasked
 1 = masked because of one thing
 2 = masked because of another
 etc., etc.

 This could be pretty powerful


 I don't think the false/true dichotomy isn't something to worry about,
 it is an implementation detail that is hidden from the user...

But Joe's point and Chris's seemingly opposite (in terms of the Boolean 
value of the mask) point are that if it is not completely hidden, and if 
it is not restricted to be Boolean but is merely treated as Boolean with 
True meaning NA or Ignore, then it can be more powerful because it can 
carry additional information without affecting its Boolean functionality 
as a mask in ufuncs.

Although I might use such a capability if it existed, to reduce the need 
to have a separate flags array corresponding to a given data array, I 
think that for my own purposes this is very low priority, and chances 
are I would often use a separate flags array even if the underlying mask 
were not restricted to Boolean.

Eric


 Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Eric Firing

On 07/01/2011 06:40 PM, Nathaniel Smith wrote:
 On Fri, Jul 1, 2011 at 9:29 AM, Christopher Jordan-Squire

 BTW, you can't access the memory of a masked value by taking a view,
 at least if I'm reading this version of the NEP correctly, and it
 seems to be the latest:

 https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst

No, to see the latest you need to go to pull request #99, I believe:
https://github.com/numpy/numpy/pull/99
 From there click the diff button, then select 
doc/neps/missing-data.rst, then view file to get to a formatted view 
of the whole file in its most recent form. You can also look at the 
history of the file there.  c-masked-array.rst was renamed to 
missing-data.rst and editing continued.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-30 Thread Eric Firing

On 06/30/2011 08:53 AM, Nathaniel Smith wrote:
 On Wed, Jun 29, 2011 at 2:21 PM, Eric Firingefir...@hawaii.edu  wrote:
 In addition, for new code, the full-blown masked array module may not be
 needed.  A convenience it adds, however, is the automatic masking of
 invalid values:

 In [1]: np.ma.log(-1)
 Out[1]: masked

 I'm sure this horrifies some, but there are times and places where it is
 a genuine convenience, and preferable to having to use a separate
 operation to replace nan or inf with NA or whatever it ends up being.

 Err, but what would this even get you? NA, NaN, and Inf basically all
 behave the same WRT floating point operations anyway, i.e., they all
 propagate?

Not exactly. First, it depends on np.seterr; second, calculations on NaN 
can be very slow, so are better avoided entirely; third, if an array is 
passed to extension code, it is much nicer if that code only has one NA 
value to handle, instead of having to check for all possible bad values.


 Is the idea that if ufunc's gain a skipna=True flag, you'd also like
 to be able to turn it into a skipna_and_nan_and_inf=True flag?

No, it is to have a situation where skipna_and_nan_and_inf would not be 
needed, because an operation generating a nan or inf would turn those 
values into NA or IGNORE or whatever right away.

Eric


 -- Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-29 Thread Eric Firing

On 06/29/2011 09:32 AM, Matthew Brett wrote:
 Hi,

[...]

 Clearly there are some overlaps between what masked arrays are trying
 to achieve and what Rs NA mechanisms are trying to achieve.  Are they
 really similar enough that they should function using the same API?
 And if so, won't that be confusing?  I think that's the question
 that's being asked.

And I think the answer is no.  No more confusing to people coming from 
R to numpy than views already are--with or without the NEP--and not 
*requiring* people to use any NA-related functionality beyond what they 
are used to from R.

My understanding of the NEP is that it directly yields an API closely 
matching that of R, but with the opportunity, via views, to do more with 
less work, if one so desires.  The present masked array module could be 
made more efficient if the NEP is implemented; regardless of whether 
this is done, the masked array module is not about to vanish, so anyone 
wanting precisely the masked array API will have it; and others remain 
free to ignore it (except for those of us involved in developing 
libraries such as matplotlib, which will have to support all variations 
of the new API along with the already-supported masked arrays).

In addition, for new code, the full-blown masked array module may not be 
needed.  A convenience it adds, however, is the automatic masking of 
invalid values:

In [1]: np.ma.log(-1)
Out[1]: masked

I'm sure this horrifies some, but there are times and places where it is 
a genuine convenience, and preferable to having to use a separate 
operation to replace nan or inf with NA or whatever it ends up being.

If np.seterr were extended to allow such automatic masking as an option, 
then the need for a separate masked array module would shrink further. 
I wouldn't mind having to use an explicit kwarg for ignoring NA in 
reduction methods.

Eric



 See you,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] missing data discussion round 2

2011-06-28 Thread Eric Firing

On 06/28/2011 07:26 AM, Nathaniel Smith wrote:
 On Tue, Jun 28, 2011 at 9:38 AM, Charles R Harris
 charlesr.har...@gmail.com  wrote:
 Nathaniel, an implementation using masks will look *exactly* like an
 implementation using na-dtypes from the user's point of view. Except that
 taking a masked view of an unmasked array allows ignoring values without
 destroying or copying the original data.

 Charles, I know that :-).

 But if that view thing is an advertised feature -- in fact, the key
 selling point for the masking-based implementation, included
 specifically to make a significant contingent of users happy -- then
 it's certainly user-visible. And it will make other users unhappy,
 like I said. That's life.

 But who cares? My main point is that implementing a missing data
 solution and a separate masked array solution is probably less work
 than implementing a single everything-to-everybody solution *anyway*,
 *and* it might make both sets of users happier too. Notice that in my
 proposal, there's really nothing there that isn't already in Mark's
 NEP in some form or another, but in my version there's almost no
 overlap between the two features. That's not because I was trying to
 make them artificially different; it's because I tried to think of the
 most natural ways to satisfy each set of use cases, and they're just
 different.

I think you are exaggerating some of the differences associated with the 
implementation, and ignoring one *key* difference: for integer types, 
the masked implementation can handle the full numeric range of the type, 
while the bit-pattern approach cannot.

Balanced against that, the *key* advantages of the bit-pattern approach 
would seem to be the simplicity of using a single array, particularly 
for IO (including memmapping) and interfacing with extension code. 
Although I am a heavy user of masked arrays, I consider these 
bit-pattern advantages to be substantial and deserving of careful 
consideration--perhaps of more weight and planning than they have gotten 
so far.

Datasets on disk--e.g. climatological data, numerical model output, 
etc.--typically do use reserved values as missing value flags, although 
occasionally one also finds separate mask arrays.

One of the real frustrations of the present masked array is that there 
is no savez/load support.  I could roll my own by using a convention 
like saving the mask of xxx as xxx__mask__, and then reversing the 
process in a modified load; but I haven't gotten around to doing it. 
Regardless of internal implementation, I hope that core support for 
missing values will be included in savez/load.

Eric




 -- Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Concepts for masked/missing data

2011-06-25 Thread Eric Firing

On 06/25/2011 09:09 AM, Benjamin Root wrote:


 On Sat, Jun 25, 2011 at 1:57 PM, Nathaniel Smith n...@pobox.com
 mailto:n...@pobox.com wrote:

 On Sat, Jun 25, 2011 at 11:50 AM, Eric Firing efir...@hawaii.edu
 mailto:efir...@hawaii.edu wrote:
   On 06/25/2011 07:05 AM, Nathaniel Smith wrote:
   On Sat, Jun 25, 2011 at 9:26 AM, Matthew
 Brettmatthew.br...@gmail.com mailto:matthew.br...@gmail.com  wrote:
   To clarify, you're proposing for:
  
   a = np.sum(np.array([np.NA, np.NA])
  
   1) -  np.NA
   2) -  0.0
  
   Yes -- and in R you get actually do get NA, while in numpy.ma
 http://numpy.ma you
   actually do get 0. I don't think this is a coincidence; I think it's
  
   No, you don't:
  
   In [2]: np.ma.array([2, 4], mask=[True, True]).sum()
   Out[2]: masked
  
   In [4]: np.sum(np.ma.array([2, 4], mask=[True, True]))
   Out[4]: masked

 Huh. So in numpy.ma http://numpy.ma, sum([10, NA]) and sum([10])
 are the same, but
 sum([NA]) and sum([]) are different? Sounds to me like you should file
 a bug on numpy.ma...


 Actually, no... I should have tested this before replying earlier:

   a = np.ma.array([2, 4], mask=[True, True])
   a
 masked_array(data = [-- --],
   mask = [ True  True],
 fill_value = 99)

   a.sum()
 masked
   a = np.ma.array([], mask=[])
   a
   a
 masked_array(data = [],
   mask = [],
 fill_value = 1e+20)
   a.sum()
 masked

 They are the same.


 Anyway, the general point is that in R, NA's propagate, and in
 numpy.ma http://numpy.ma, masked values are ignored (except,
 apparently, if all values
 are masked). Here, I actually checked these:

 Python: np.ma.array([2, 4], mask=[True, False]).sum() - 4
 R: sum(c(NA, 4)) - NA


 If you want NaN behavior, then use NaNs.  If you want masked behavior,
 then use masks.

But I think that where Mark is heading is towards infrastructure that 
makes it easy and efficient to do either, as needed, case by case, line 
by line, for any dtype--not just floats.  If he can succeed, that helps 
all of us.  This doesn't have to be R versus masked arrays, or 
beginners versus experienced programmers.

Eric


 Ben Root



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] feedback request: proposal to add masks to the core ndarray

2011-06-23 Thread Eric Firing

On 06/23/2011 11:19 AM, Nathaniel Smith wrote:
 I'd like to see a statement of what the missing data problem is, and
 how this solves it? Because I don't think this is entirely intuitive,
 or that everyone necessarily has the same idea.

 Reduction operations like 'sum', 'prod', 'min', and 'max' will operate as if 
 the values weren't there

 For context: My experience with missing data is in statistical
 analysis; I find R's NA support to be pretty awesome for those
 purposes. The conceptual model it's based on is that an NA value is
 some number that we just happen not to know. So from this perspective,
 I find it pretty confusing that adding an unknown quantity to 3 should
 result in 3, rather than another unknown quantity. (Obviously it
 should be possible to compute the sum of the known values, but IME
 it's important for the default behavior to be to fail loudly when
 things are wonky, not to silently patch them up, possibly
 incorrectly!)

 From the oceanographic data acquisition and analysis perspective, and 
perhaps from a more general plotting perspective (matplotlib, 
specifically) missing data is simply missing; we don't have it, we never 
will, but we need to do the best calculation (or plot) we can with what 
is left.  For plotting, that generally means showing a gap in a line, a 
hole in a contour plot, etc.  For calculations like basic statistics, it 
means doing the calculation, e.g. a mean, with the available numbers, 
*and* having an easy way to find out how many numbers were available. 
That's what the masked array count() method is for.

Some types of calculations, like the FFT, simply can't be done by 
ignoring missing values, so one must first use some filling method, 
perhaps interpolation, for example, and then pass an unmasked array to 
the function.

The present masked array module is very close to what is really needed 
for the sorts of things I am involved with.  It looks to me like the 
main deficiencies are addressed by Mark's proposal, although the change 
in the definition of the mask might make for a painful transition.

Eric


 Also, what should 'dot' do with missing values?

 -- Nathaniel

 On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebemwwi...@gmail.com  wrote:
 Enthought has asked me to look into the missing data problem and how NumPy
 could treat it better. I've considered the different ideas of adding dtype
 variants with a special signal value and masked arrays, and concluded that
 adding masks to the core ndarray appears is the best way to deal with the
 problem in general.
 I've written a NEP that proposes a particular design, viewable here:
 https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst
 There are some questions at the bottom of the NEP which definitely need
 discussion to find the best design choices. Please read, and let me know of
 all the errors and gaps you find in the document.
 Thanks,
 Mark
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast grayscale conversion

2011-06-20 Thread Eric Firing

On 06/20/2011 10:41 AM, Zachary Pincus wrote:
 You could try:
 src_mono = src_rgb.astype(float).sum(axis=-1) / 3.

 But that speed does seem slow. Here are the relevant timings on my machine (a 
 recent MacBook Pro) for a 3.1-megapixel-size array:
 In [16]: a = numpy.empty((2048, 1536, 3), dtype=numpy.uint8)

 In [17]: timeit numpy.dot(a.astype(float), numpy.ones(3)/3.)
 10 loops, best of 3: 116 ms per loop

 In [18]: timeit a.astype(float).sum(axis=-1)/3.
 10 loops, best of 3: 85.3 ms per loop

 In [19]: timeit a.astype(float)
 10 loops, best of 3: 23.3 ms per loop



On my slower machine (older laptop, core2 duo), you can speed it up more:

In [3]: timeit a.astype(float).sum(axis=-1)/3.0
1 loops, best of 3: 235 ms per loop

In [5]: timeit b = a.astype(float).sum(axis=-1); b /= 3.0
1 loops, best of 3: 181 ms per loop

In [7]: timeit b = a.astype(np.float32).sum(axis=-1); b /= 3.0
10 loops, best of 3: 148 ms per loop

If you really want float64, it is still faster to do the first operation 
with single precision:

In [8]: timeit b = a.astype(np.float32).sum(axis=-1).astype(np.float64); 
b /= 3.0
10 loops, best of 3: 163 ms per loop

Eric




 On Jun 20, 2011, at 4:15 PM, Alex Flint wrote:

 At the moment I'm using numpy.dot to convert a WxHx3 RGB image to a 
 grayscale image:

 src_mono = np.dot(src_rgb.astype(np.float), np.ones(3)/3.);

 This seems quite slow though (several seconds for a 3 megapixel image) - is 
 there a more specialized routine better suited to this?

 Cheers,
 Alex

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] unwrap for masked arrays?

2011-06-17 Thread Eric Firing

On 06/17/2011 06:56 AM, Benjamin Root wrote:
 It does not appear that unwrap works properly for masked arrays.  First,
 it uses np.asarray() at the start of the function.  However, that alone
 would not fix the problem given the nature of how unwrap works
 (performing diff operations).  I tried a slightly modified version of
 unwrap, but I could not get it to always work properly.  Anybody know of
 an alternative or a work-around?

http://currents.soest.hawaii.edu/hgstage/pycurrents/file/08f9137a2a08/data/navcalc.py

Eric

 Thanks,
 Ben Root



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Python memory management issues using, Linux. Maybe Numpy, related.

2011-05-22 Thread Eric Firing

On 05/22/2011 08:17 AM, Jeffrey Spencer wrote:
 from numpy import arange, sum

 for x in range(1000):
  inhibVal = sum(arange(15))


Memory usage stays constant with Ubuntu 11.04, 64-bit, using the numpy 
1.5.1 package from ubuntu, and using 1.6.1.dev-a265004.


efiring@manini:~$ uname -a
Linux manini 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 
2011 x86_64 x86_64 x86_64 GNU/Linux


Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

1 2 3 >

1 - 100 of 263 matches

Mail list logo