Re: [Numpy-discussion] Floor divison on int returns float
Thanks Eric. Also relevant: https://github.com/numba/numba/issues/909 Looks like Numba has found a way to avoid this edge case. On Monday, April 4, 2016, Eric Firing <efir...@hawaii.edu> wrote: > On 2016/04/04 9:23 AM, T J wrote: > >> I'm on NumPy 1.10.4 (mkl). >> >> >>> np.uint(3) // 2 # 1.0 >> >>> 3 // 2 # 1 >> >> Is this behavior expected? It's certainly not desired from my >> perspective. If this is not a bug, could someone explain the rationale >> to me. >> >> Thanks. >> > > I agree that it's almost always undesirable; one would reasonably expect > some sort of int. Here's what I think is going on: > > The odd behavior occurs only with np.uint, which is np.uint64, and when > the denominator is a signed int. The problem is that if the denominator is > negative, the result will be negative, so it can't have the same type as > the first numerator. Furthermore, if the denominator is -1, the result > will be minus the numerator, and that can't be represented by np.uint or > np.int. Therefore the result is returned as np.float64. The promotion > rules are based on what *could* happen in an operation, not on what *is* > happening in a given instance. > > Eric > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Floor divison on int returns float
I'm on NumPy 1.10.4 (mkl). >>> np.uint(3) // 2 # 1.0 >>> 3 // 2 # 1 Is this behavior expected? It's certainly not desired from my perspective. If this is not a bug, could someone explain the rationale to me. Thanks. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Round away from zero (towards +/- infinity)
It does, but it is not portable. That's why I was hoping NumPy might think about supporting more rounding algorithms. On Thu, Oct 2, 2014 at 10:00 PM, John Zwinck jzwi...@gmail.com wrote: On 3 Oct 2014 07:09, T J tjhn...@gmail.com wrote: Any bites on this? On Wed, Sep 24, 2014 at 12:23 PM, T J tjhn...@gmail.com wrote: Python's round function goes away from zero, so I am looking for the NumPy equivalent (and using vectorize() seems undesirable). In this sense, it seems that having a ufunc for this type of rounding could be helpful. Aside: Is there interest in a more general around() that allows users to specify alternative tie-breaking rules, with the default staying 'round half to nearest even'? [1] --- [1] http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy I like the solution given in that Stack Overflow post, namely using ctypes to call fesetround(). Does that work for you? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] 0/0 == 0?
Hi, I'm using NumPy 1.8.2: In [1]: np.array(0) / np.array(0) Out[1]: 0 In [2]: np.array(0) / np.array(0.0) Out[2]: nan In [3]: np.array(0.0) / np.array(0) Out[3]: nan In [4]: np.array(0.0) / np.array(0.0) Out[4]: nan In [5]: 0/0 --- ZeroDivisionError Traceback (most recent call last) ipython-input-6-6549dea6d1ae in module() 1 0/0 ZeroDivisionError: integer division or modulo by zero Out[1] seems odd. I get the right value in 1.8.1. Was this fixed for 1.9.0? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Round away from zero (towards +/- infinity)
Any bites on this? On Wed, Sep 24, 2014 at 12:23 PM, T J tjhn...@gmail.com wrote: Is there a ufunc for rounding away from zero? Or do I need to do x2 = sign(x) * ceil(abs(x)) whenever I want to round away from zero? Maybe the following is better? x_ceil = ceil(x) x_floor = floor(x) x2 = where(x = 0, x_ceil, x_floor) Python's round function goes away from zero, so I am looking for the NumPy equivalent (and using vectorize() seems undesirable). In this sense, it seems that having a ufunc for this type of rounding could be helpful. Aside: Is there interest in a more general around() that allows users to specify alternative tie-breaking rules, with the default staying 'round half to nearest even'? [1] Also, what is the difference between NumPy's fix() and trunc() functions? It seems like they achieve the same goal. trunc() was added in 1.3.0. So is fix() just legacy? --- [1] http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Round away from zero (towards +/- infinity)
Is there a ufunc for rounding away from zero? Or do I need to do x2 = sign(x) * ceil(abs(x)) whenever I want to round away from zero? Maybe the following is better? x_ceil = ceil(x) x_floor = floor(x) x2 = where(x = 0, x_ceil, x_floor) Python's round function goes away from zero, so I am looking for the NumPy equivalent (and using vectorize() seems undesirable). In this sense, it seems that having a ufunc for this type of rounding could be helpful. Aside: Is there interest in a more general around() that allows users to specify alternative tie-breaking rules, with the default staying 'round half to nearest even'? [1] Also, what is the difference between NumPy's fix() and trunc() functions? It seems like they achieve the same goal. trunc() was added in 1.3.0. So is fix() just legacy? --- [1] http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Missing Data
What is the status of: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst and of missing data in Numpy, more generally? Is np.ma.array still the state-of-the-art way to handle missing data? Or has something better and more comprehensive been put together? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Vectorize and ufunc attribute
Prior to 1.7, I had working compatibility code such as the following: if has_good_functions: # http://projects.scipy.org/numpy/ticket/1096 from numpy import logaddexp, logaddexp2 else: logaddexp = vectorize(_logaddexp, otypes=[numpy.float64]) logaddexp2 = vectorize(_logaddexp2, otypes=[numpy.float64]) # Run these at least once so that .ufunc.reduce exists logaddexp([1.,2.,3.],[1.,2.,3.]) logaddexp2([1.,2.,3.],[1.,2.,3.]) # And then make reduce available at the top level logaddexp.reduce = logaddexp.ufunc.reduce logaddexp2.reduce = logaddexp2.ufunc.reduce The point was that I wanted to treat the output of vectorize as a hacky drop-in replacement for a ufunc. In 1.7, I discovered that vectorize had changed (https://github.com/numpy/numpy/pull/290), and now there is no longer a ufunc attribute at all. Should this be added back in? Besides hackish drop-in replacements, I see value in to being able to call reduce, accumulate, etc (when possible) on the output of vectorize(). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Vectorize and ufunc attribute
On Tue, Mar 12, 2013 at 9:59 AM, Bradley M. Froehle brad.froe...@gmail.comwrote: T J: You may want to look into `numpy.frompyfunc` ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html ). Yeah that's better, but it doesn't respect the output type of the function. Be nice if this supported the otypes keyword. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fwd: [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303)
On Fri, Oct 12, 2012 at 1:04 PM, Sturla Molden stu...@molden.no wrote: I'm still rather sure GIS functionality belongs in scipy.spatial instead of numpy. From the link: FocalMax Finds the highest value for each cell location on an input grid within a specified neighborhood and sends it to the corresponding cell location on the output grid. Isn't this much more generic than GIS? Maybe the criteria should be: Would you expect to find extensive discussion on this functionality within a standard text on spatial data structures? Sliding windows and nice boundary handling seems less spatial to me, and much further from B-trees, quadtress, kd-trees, persistent data structures, etc. Can you elaborate your position a bit more? Sorry if I'm being dense. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
On Sat, Jun 30, 2012 at 1:26 PM, josef.p...@gmail.com wrote: just some statistics http://stackoverflow.com/questions/tagged/numpy 769 followers, 2,850 questions tagged a guess: average response time for regular usage question far less than an hour http://stackoverflow.com/questions/tagged/scipy 446 followers, 991questions tagged Yes they are frequently very quick and pinpoint. To provide yet another data point, I will mention that I used to be an avid follower of comp.text.tex. I would post questions there and also read it for knowledge. Now, I use http://tex.stackexchange.com/ almost exclusively. I know many others have done the same. I've also noticed a number of LaTeX gurus using the stackexchange site more and more. Try googling a LaTeX (esp TikZ) question. Would you rather read through an archived newsgroup (mailing list in NumPy's case) or have a webpage with useful features, embedded images, etc? jdh noticed this as well: the majority of the messages to numpy-discussion in the last 2 months have not been usage questions but decisions, releases, debates, etc. Personally, I would push for the stackexchange solution over a 'user' mailing list. That said, comp.text.tex and tex.stackexchange.com coexist just fine---it just means there is redudnancy and not the good kind IMO. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
On Sat, Jun 30, 2012 at 1:50 PM, srean srean.l...@gmail.com wrote: Anecdotal data-point: I have been happy with SO in general. It works for certain types of queries very well. OTOH if the answer to the question is known only to a few and he/she does not happen to be online at time the question was posted, and he/she does not pull such possible questions by key-words, that question is all but history. The difference is that on a mailing list questions are pushed on to people who might be able to answer it, whereas in SO model people have to actively seek questions they want to answer. Unanticipated, niche questions tend to disappear. Isn't that what the various sections are for? http://stackoverflow.com/questions?sort=newest http://stackoverflow.com/questions?sort=unanswered And then, if you want modification-by-modification updates: http://stackoverflow.com/questions?sort=active Entries are sorted by date and you can view as many pages worth as are available. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Meta: help, devel and stackoverflow
On Thu, Jun 28, 2012 at 3:23 PM, Fernando Perez fperez@gmail.comwrote: On Thu, Jun 28, 2012 at 3:06 PM, srean srean.l...@gmail.com wrote: What I like about having two lists is that on one hand it does not prevent me or you from participating in both, on the other hand it allows those who dont want to delve too deeply in one aspect or the other, the option of a cleaner inbox, or the option of having separate inboxes. I for instance would like to be in both the lists, perhaps mostly as a lurker, but still would want to have two different folders just for better organization. I just want to mention that even as a project leader, I benefit from this: when I'm swamped, I simply ignore the user list. Not a nice thing to do, perhaps, but given the choice between moving the project forward and helping a new user, with often very limited time, I think it's the best solution possible. Of course I do help in the user list when I can, but I mostly encourage more experienced users to help new ones, so that our small dev team can spend its limited time moving the project forward. I'm okay with having two lists as it does filtering for me, but this seems like a sub-optimal solution. Observation: Some people would like to apply labels to incoming messages. Reality: Email was not really designed for that. We can hack it by using two different email addresses, but why not just keep this list as is and make a concentrated effort to promote the use of 2.0 technologies, like stackoverflow/askbot/etc? There, people can put as many tags as desired on questions: matrix, C-API, iteration, etc. Potentially, these tags would streamline everyone's workflow. The stackoverflow setup also makes it easier for users to search for solutions to common questions, and know that the top answer is still an accurate answer. [No one likes finding old invalid solutions.] The reputation system and up/down votes also help new users figure out which responses to trust. As others have explained, it does seem that there are distinct types of discussions that take place on this list. 1) There are community discussiuons/debates. Examples are the NA discussion, the bug tracker, release schedule, ABI/API changes, matrix rank tolerance too low, lazy evaluation, etc. These are clearly mailing-list topics. If you look at all the messages for the last two(!) months, it seems like this type of message has been the dominate type. 2) There are also standard questions. Recent examples are memory allocation at assignment, dot() function question, not expected output of fill_diagonal, silly isscalar question. These messages seem much more suited to the stackoverflow environment. In fact, I'd be happy if we redirected such questions to stackoverflow. This has the added benefit that responses to such questions will stay on topic. Note that if a stackoverflow question seeds a discussion, then someone can start a new thread on the mailing list which cite the stackoverflow question. tl;dr Keep this list the same, and push user questions to stackoverflow instead of pushing them to a user list. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Should arr.diagonal() return a copy or aview? (1.7 compatibility issue)
On Wed, May 23, 2012 at 4:16 PM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** On Wed, 2012-05-23 at 17:31 -0500, Nathaniel Smith wrote: On Wed, May 23, 2012 at 10:53 PM, Travis Oliphant tra...@continuum.io wrote: To be clear, I'm not opposed to the change, and it looks like we should go forward. In my mind it's not about developers vs. users as satisfying users is the whole point. The purpose of NumPy is not to make its developers happy :-). But, users also want there to *be* developers on NumPy so developer happiness is not irrelevant. In this case, though, there are consequences for users because of the double copy if a user wants to make their code future proof. We are always trading off predicted user-experiences.I hope that we all don't have the same perspective on every issue or more than likely their aren't enough voices being heard from real users. I'm not really worried about users who have a problem with thedouble-copy. It's a totally legitimate concern, but anyone who hasthat concern has already understood the issues well enough to be ableto take care of themselves, and decided that it's worth the effort tospecial-case this. They can check whether the returned array has .baseset to tell whether it's an array or a view, use a temporary hack tocheck for the secret warning flag in arr.flags.num, check the numpyversion, all sorts of things to get them through the one version wherethis matters. The suggestion in the docs to make a copy is not exactlybinding :-). -- Nathaniel As a real user, if I care about whether an array arr2 is a copy or a view, I usually either check arr2.flags.owndata or append copy() to the statement that created arr2, e.g., arr2 = arr.diagonal().copy(). Numpy rules on views vs. copies sometimes require a bit of thought, and so I'll frequently just check the flags or make a copy instead of thinking. (More foolproof :).) It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods here to stay? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)
On Fri, May 11, 2012 at 1:12 PM, Mark Wiebe mwwi...@gmail.com wrote: On Fri, May 11, 2012 at 2:18 PM, Pauli Virtanen p...@iki.fi wrote: 11.05.2012 17:54, Frédéric Bastien kirjoitti: In Theano we use a view, but that is not relevant as it is the compiler that tell what is inplace. So this is invisible to the user. What about a parameter to diagonal() that default to return a view not writable as you said. The user can then choose what it want and this don't break the inferface. [clip] Agreed, it seems this is the appropriate way to go on here `diagonal(copy=True)`. A more obscure alternative would be to add a separate method that returns a view. This looks like the best way to deal with it, yes. Cheers, Mark I don't think changing the default behavior in a later release is a good idea. It's a sort of an API wart, but IMHO better that than subtle code breakage. copy=True seems fine, but is this the final plan? What about long term, should diag() eventually be brought in line with transpose() and reshape() so that it is a view by default? Changing default behavior is certainly not something that should be done all the time, but it *can* be done if deprecated appropriately. A more consistent API is better than one with warts (if this particular issue is actually seen as a wart). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
On Wed, Feb 15, 2012 at 12:45 PM, Alan G Isaac alan.is...@gmail.com wrote: for the core developers. The right way to produce a governance structure is to make concrete proposals and show how these proposals are in the interest of the *developers* (as well as of the users). At this point, it seems to me that Matthew is simply trying to make the case that ==some== governance structure should be (or should be more clearly) set up. Even this fairly modest goal seems to be receiving blowback from some. Perhaps a specific proposal would be more convincing to those in opposition, but I'd like to think that the merits for a governance structure could be appreciated without having specific the details of what such a structure would look like. Matthew's links certainly make a good case, IMO. He has also described a number of scenarios where a governance structure would be helpful (even if we think the likelihood of such scenarios is small). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 11:59 AM, Pauli Virtanen p...@iki.fi wrote: I have a feeling that if you don't start by mathematically defining the scalar operations first, and only after that generalize them to arrays, some conceptual problems may follow. Yes. I was going to mention this point as well. For shorthand, we can refer to the above choices with the nomenclature shorthand ::= propagation destructivity payload_type propagation ::= P | N destructivity ::= d | n | s payload_type ::= S | E | C That makes 2 * 3 * 3 = 18 different ways to construct consistent behavior. Some of them might make sense, the problem is to find out which :) This is great for the discussion, IMO. The self-destructive assignment hasn't come up at all, so I'm guessing we can probably ignore it. --- Can you be a bit more explicit on the payload types? Let me try, respond with corrections if necessary. S is singleton and in the case of missing data, we take it to mean that we only care that data is missing and not *how* missing the data is. x = MISSING -x # unary MISSING x + 3 # binary MISSING E means that we acknowledge that we want to track the how, but that we aren't interested in it. So raise an error. In the case of ignored data, we might have: x = 2 ignore(x) x IGNORED(2) -x Error x + 3 Error C means that we acknowledge that we want to track the how, and that we are interested in it. So do the computations. x = 2 ignore(x) -x IGNORED(-2) x + 3 IGNORED(5) Did I get that mostly right? NAN and NA apparently fall into the PdS class. Here is where I think we need ot be a bit more careful. It is true that we want NAN and MISSING to propagate, but then we additionally want to ignore it sometimes. This is precisely why we have functions like nansum. Although people are well-aware of this desire, I think this thread has largely conflated the issues when discussing propagation. To push this forward a bit, can I propose that IGNORE behave as: PnC x = np.array([1, 2, 3]) y = np.array([10, 20, 30]) ignore(x[2]) x [1, IGNORED(2), 3] x + 2 [3, IGNORED(4), 5] x + y [11, IGNORED(22), 33] z = x.sum() z IGNORED(6) unignore(z) z 6 x.sum(skipIGNORED=True) 4 When done in this fashion, I think it is perfectly fine for masks to be unmasked. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 1:03 PM, Gary Strangman str...@nmr.mgh.harvard.eduwrote: To push this forward a bit, can I propose that IGNORE behave as: PnC x = np.array([1, 2, 3]) y = np.array([10, 20, 30]) ignore(x[2]) x [1, IGNORED(2), 3] x + 2 [3, IGNORED(4), 5] x + y [11, IGNORED(22), 33] z = x.sum() z IGNORED(6) unignore(z) z 6 x.sum(skipIGNORED=True) 4 In my mind, IGNORED items should be skipped by default (i.e., skipIGNORED seems redundant ... isn't that what ignoring is all about?). Thus I might instead suggest the opposite (default) behavior at the end: x = np.array([1, 2, 3]) y = np.array([10, 20, 30]) ignore(x[2]) x [1, IGNORED(2), 3] x + 2 [3, IGNORED(4), 5] x + y [11, IGNORED(22), 33] z = x.sum() z 4 unignore(x).sum() 6 x.sum(keepIGNORED=True) 6 (Obviously all the syntax is totally up for debate.) I agree that it would be ideal if the default were to skip IGNORED values, but that behavior seems inconsistent with its propagation properties (such as when adding arrays with IGNORED values). To illustrate, when we did x+2, we were stating that: IGNORED(2) + 2 == IGNORED(4) which means that we propagated the IGNORED value. If we were to skip them by default, then we'd have: IGNORED(2) + 2 == 2 To be consistent, then it seems we also should have had: x + 2 [3, 2, 5] which I think we can agree is not so desirable. What this seems to come down to is that we tend to want different behavior when we are doing reductions, and that for IGNORED data, we want it to propagate in every situation except for a reduction (where we want to skip over it). I don't know if there is a well-defined way to distinguish reductions from the other operations. Would it hold for generalized ufuncs? Would it hold for other functions which might return arrays instead of scalars? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 2:41 PM, Pauli Virtanen p...@iki.fi wrote: 04.11.2011 20:49, T J kirjoitti: [clip] To push this forward a bit, can I propose that IGNORE behave as: PnC The *n* classes can be a bit confusing in Python: ### PnC x = np.array([1, 2, 3]) y = np.array([4, 5, 6]) ignore(y[1]) z = x + y z np.array([5, IGNORE(7), 9]) x += y # NB: x[1] := x[1] + y[1] x np.array([5, 2, 3]) *** Interesting. I think I defined the destructive and non-destructive in a different way than earlier in the thread. Maybe this behavior from np.ma is closer to what was meant earlier: x = np.ma.array([1, 2, 3], mask=[0, 0, 1]) y = np.ma.array([4, 5, 6], mask=[0, 1, 1]) x += y x masked_array(data = [5 -- --], mask = [False True True], fill_value = 99) x.data array([5, 2, 3]) Let's call this (since I botched and already reserved the letter n :) (m) mark-ignored a := SPECIAL_1 # - a == SPECIAL_a ; the payload of the RHS is neglected, # the assigned value has the original LHS # as the payload Does this behave as expected for x + y (as opposed to the inplace operation)? z = x + y z np.array([5, IGNORED(2), IGNORED(3)]) x += y np.array([5, IGNORED(2), IGNORED(3)]) However, doesn't this have the issue that Nathaniel brought up earlier: commutativity unignore(x + y) != unignore(y + x) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 2:29 PM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 4, 2011 at 1:22 PM, T J tjhn...@gmail.com wrote: I agree that it would be ideal if the default were to skip IGNORED values, but that behavior seems inconsistent with its propagation properties (such as when adding arrays with IGNORED values). To illustrate, when we did x+2, we were stating that: IGNORED(2) + 2 == IGNORED(4) which means that we propagated the IGNORED value. If we were to skip them by default, then we'd have: IGNORED(2) + 2 == 2 To be consistent, then it seems we also should have had: x + 2 [3, 2, 5] which I think we can agree is not so desirable. What this seems to come down to is that we tend to want different behavior when we are doing reductions, and that for IGNORED data, we want it to propagate in every situation except for a reduction (where we want to skip over it). I don't know if there is a well-defined way to distinguish reductions from the other operations. Would it hold for generalized ufuncs? Would it hold for other functions which might return arrays instead of scalars? Continuing my theme of looking for consensus first... there are obviously a ton of ugly corners in here. But my impression is that at least for some simple cases, it's clear what users want: a = [1, IGNORED(2), 3] # array-with-ignored-values + unignored scalar only affects unignored values a + 2 [3, IGNORED(2), 5] # reduction operations skip ignored values np.sum(a) 4 For example, Gary mentioned the common idiom of wanting to take an array and subtract off its mean, and he wants to do that while leaving the masked-out/ignored values unchanged. As long as the above cases work the way I wrote, we will have np.mean(a) 2 a -= np.mean(a) a [-1, IGNORED(2), 1] Which I'm pretty sure is the result that he wants. (Gary, is that right?) Also numpy.ma follows these rules, so that's some additional evidence that they're reasonable. (And I think part of the confusion between Lluís and me was that these are the rules that I meant when I said non-propagating, but he understood that to mean something else.) So before we start exploring the whole vast space of possible ways to handle masked-out data, does anyone see any reason to consider rules that don't have, as a subset, the ones above? Do other rules have any use cases or user demand? (I *love* playing with clever mathematics and making things consistent, but there's not much point unless the end result is something that people will use :-).) I guess I'm just confused on how one, in principle, would distinguish the various forms of propagation that you are suggesting (ie for reductions). I also don't think it is good that we lack commutativity. If we disallow unignoring, then yes, I agree that what you wrote above is what people want. But if we are allowed to unignore, then I do not. Also, how does something like this get handled? a = [1, 2, IGNORED(3), NaN] If I were to say, What is the mean of 'a'?, then I think most of the time people would want 1.5. I guess if we kept nanmean around, then we could do: a -= np.nanmean(a) [-.5, .5, IGNORED(3), NaN] Sorry if this is considered digging deeper than consensus. I'm just curious if arrays having NaNs in them, in addition to IGNORED, causes problems. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 3:38 PM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 4, 2011 at 3:08 PM, T J tjhn...@gmail.com wrote: On Fri, Nov 4, 2011 at 2:29 PM, Nathaniel Smith n...@pobox.com wrote: Continuing my theme of looking for consensus first... there are obviously a ton of ugly corners in here. But my impression is that at least for some simple cases, it's clear what users want: a = [1, IGNORED(2), 3] # array-with-ignored-values + unignored scalar only affects unignored values a + 2 [3, IGNORED(2), 5] # reduction operations skip ignored values np.sum(a) 4 For example, Gary mentioned the common idiom of wanting to take an array and subtract off its mean, and he wants to do that while leaving the masked-out/ignored values unchanged. As long as the above cases work the way I wrote, we will have np.mean(a) 2 a -= np.mean(a) a [-1, IGNORED(2), 1] Which I'm pretty sure is the result that he wants. (Gary, is that right?) Also numpy.ma follows these rules, so that's some additional evidence that they're reasonable. (And I think part of the confusion between Lluís and me was that these are the rules that I meant when I said non-propagating, but he understood that to mean something else.) So before we start exploring the whole vast space of possible ways to handle masked-out data, does anyone see any reason to consider rules that don't have, as a subset, the ones above? Do other rules have any use cases or user demand? (I *love* playing with clever mathematics and making things consistent, but there's not much point unless the end result is something that people will use :-).) I guess I'm just confused on how one, in principle, would distinguish the various forms of propagation that you are suggesting (ie for reductions). Well, numpy.ma does work this way, so certainly it's possible to do. At the code level, np.add() and np.add.reduce() are different entry points and can behave differently. I see your point, but that seems like just an API difference with a bad name. reduce() is just calling add() a bunch of times, so it seems like it should behave as add() does. That we can create different behaviors with various assignment rules (like Pauli's 'm' for mark-ignored), only makes it more confusing to me. a = 1 a += 2 a += IGNORE b = 1 + 2 + IGNORE I think having a == b is essential. If they can be different, that will only lead to confusion. On this point alone, does anyone think it is acceptable to have a != b? OTOH, it might be that it's impossible to do *while still maintaining other things we care about*... but in that case we should just shake our fists at the mathematics and then give up, instead of coming up with an elegant system that isn't actually useful. So that's why I think we should figure out what's useful first. Agreed. I'm on the same page. I also don't think it is good that we lack commutativity. If we disallow unignoring, then yes, I agree that what you wrote above is what people want. But if we are allowed to unignore, then I do not. I *think* that for the no-unignoring (also known as MISSING) case, we have a pretty clear consensus that we want something like: a + 2 [3, MISSING, 5] np.sum(a) MISSING np.sum(a, skip_MISSING=True) 4 (Please say if you disagree, but I really hope you don't!) This case is also easier, because we don't even have to allow a skip_MISSING flag in cases where it doesn't make sense (e.g., unary or binary operations) -- it's a convenience feature, so no-one will care if it only works when it's useful ;-). Yes, in agreement. I was talking specifically about the IGNORE case. And my point is that if we allow people to remove the IGNORE flag and see the original data (and if the payloads are computed), then we should care about commutativity: x = [1, IGNORE(2), 3] x2 = x.copy() y = [10, 11, IGNORE(12)] z = x + y a = z.sum() x += y b = x.sum() y += x2 c = y.sum() So, we should have: a == b == c. Additionally, if we allow users to unignore data, then we should have: x = [1, IGNORE(2), 3] x2 = x.copy() y = [10, 11, IGNORE(12)] x += y aa = unignore(x).sum() y += x2 bb = unignore(y).sum() aa == bb True Is there agreement on this? Also, how does something like this get handled? a = [1, 2, IGNORED(3), NaN] If I were to say, What is the mean of 'a'?, then I think most of the time people would want 1.5. I would want NaN! But that's because the only way I get NaN's is when I do dumb things like compute log(0), and again, I want my code to tell me that I was dumb instead of just quietly making up a meaningless answer. That's definitely field specific then. In probability: 0 = 0 log(0) is a common idiom. In NumPy, 0 log(0) gives NaN, so you'd want to ignore then when summing. ___ NumPy-Discussion mailing list NumPy-Discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 4:29 PM, Pauli Virtanen p...@iki.fi wrote: 04.11.2011 23:29, Pauli Virtanen kirjoitti: [clip] As the definition concerns only what happens on assignment, it does not have problems with commutativity. This is of course then not really true in a wider sense, as an example from T J shows: a = 1 a += IGNORE(3) # - a := a + IGNORE(3) # - a := IGNORE(4) # - a == IGNORE(1) which is different from a = 1 + IGNORE(3) # - a == IGNORE(4) Damn, it seemed so good. Probably anything expect destructive assignment leads to problems like this with propagating special values. Ok...with what I understand now, it seems like for almost all operations: MISSING : PdS IGNORED : PdC (this gives commutivity when unignoring data points) When you want some sort of reduction, we want to change the behavior for IGNORED so that it skips the IGNORED values by default. Personally, I still believe that this non-consistent behavior warrants a new method name. What I mean is: x = np.array([1, IGNORED(2), 3]) y = x.sum() z = x[0] + x[1] + x[2] To say that y != z will only be a source of confusion. To remedy, we force people to be explicit, even if they'll need to be explicit 99% of the time: q = x.sum(skipIGNORED=True) Then we can have y == z and y != q. To make the 99% use case easier, we provide a new method which passings the keyword for us. With PdS and PdC is seems rather clear to me why MISSING should be implemented as a bit pattern and IGNORED implemented using masks. Setting implementation details aside and going back to Nathaniel's original biggest *un*resolved question, I am now convinced that these (IGNORED and MISSING) should be distinct API concepts and still yet distinct from NaN with floating point dtypes. The NA implementation in NumPy does not seem to match either of these (IGNORED and MISSING) exactly. One cannot, as far as I know, unignore an element marked as NA. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 6:31 PM, Pauli Virtanen p...@iki.fi wrote: 05.11.2011 00:14, T J kirjoitti: [clip] a = 1 a += 2 a += IGNORE b = 1 + 2 + IGNORE I think having a == b is essential. If they can be different, that will only lead to confusion. On this point alone, does anyone think it is acceptable to have a != b? It seems to me that requiring this sort of a thing gives some limitations on how array operations should behave. An acid test for proposed rules: given two arrays `a` and `b`, a = [1, 2, IGNORED(3), IGNORED(4)] b = [10, IGNORED(20), 30, IGNORED(40)] (a) Are the following pieces of code equivalent: print unmask(a + b) a += 42 a += b print unmask(a) and print unmask(b + a) a += b a += 42 print unmask(a) (b) Are the following two statements equivalent (wrt. ignored values): a += b a[:] = a + b For np.ma (a) is false whereas (b) is true. For arrays containing nans, on the other hand (a) and (b) are both true (but of course, in this case values cannot be unmasked). Is there a way to define operations so that (a) is true, while retaining the desired other properties of arrays with ignored values? I thought that PdC satisfied (a) and (b). Let me show you what I thought they were. Perhaps I am not being consistent. If so, point out my mistake. (A) You are making two comparisons. (A1) Does unmask(a+b) == unmask(b + a) ? Yes. They both equal: unmask([11, IGNORED(22), IGNORED(33), IGNORED(44)]) = [11, 22, 33, 44] (A2) Is 'a' the same after a += 42; a += b and a += b; a += 42? Yes. a = [1, 2, IGNORED(3), IGNORED(4)] b = [10, IGNORED(20), 30, IGNORED(40)] a += 42; a [43, 44, IGNORED(45), IGNORED(46)] a += b; a [53, IGNORED(64), IGNORED(75), IGNORED(86)] vs a = [1, 2, IGNORED(3), IGNORED(4)] b = [10, IGNORED(20), 30, IGNORED(40)] a += b; a [11, IGNORED(22), IGNORED(33), IGNORED(44)] a += 42; a [53, IGNORED(64), IGNORED(75), IGNORED(86)] For part (B), I thought we were in agreement that inplace assignment should be defined so that: a += b is equivalent to: tmp = a + b a = tmp If so, this definitely holds. Have I missed something? Probably. Please spell it out for me. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] in the NA discussion, what can we agree on?
On Fri, Nov 4, 2011 at 8:03 PM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 4, 2011 at 7:43 PM, T J tjhn...@gmail.com wrote: On Fri, Nov 4, 2011 at 6:31 PM, Pauli Virtanen p...@iki.fi wrote: An acid test for proposed rules: given two arrays `a` and `b`, a = [1, 2, IGNORED(3), IGNORED(4)] b = [10, IGNORED(20), 30, IGNORED(40)] [...] (A1) Does unmask(a+b) == unmask(b + a) ? Yes. They both equal: unmask([11, IGNORED(22), IGNORED(33), IGNORED(44)]) = [11, 22, 33, 44] Again, I really don't think you're going to be able to sell an API where [2] + [IGNORED(20)] == [IGNORED(22)] I mean, it's not me you have to convince, it's Gary, Pierre, maybe Benjamin, Lluís, etc. So I could be wrong. But you might want to figure that out first before making plans based on this... But this is how np.ma currently does it, except that it doesn't compute the payload---it just calls it IGNORED. And it seems that this generalizes the way people want it to: z = [2, 4] + [IGNORED(20), 3] z [IGNORED(24), 7] z.sum(skip_ignored=True) # True could be the default 7 z.sum(skip_ignored=False) IGNORED(31) I guess I am confused because it seems that you implicitly used this same rule here: Say we have a = np.array([1, IGNORED(2), 3]) b = np.array([10, 20, 30]) (Here's I'm using IGNORED(2) to mean a value that is currently ignored, but if you unmasked it it would have the value 2.) Then we have: # non-propagating **or** propagating, doesn't matter: a + 2 [3, IGNORED(2), 5] That is, element-wise, you had to have done: IGNORED(2) + 2 -- IGNORED(2). I said it should be equal to IGNORED(4), but the result is still some form of ignore. Sorry if I am missing the bigger picture at this pointits late and a Fri. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Example Usage of Neighborhood Iterator in Cython
I recently put together a Cython example which uses the neighborhood iterator. It was trickier than I thought it would be, so I thought to share it with the community. The function takes a 1-dimensional array and returns a 2-dimensional array of neighborhoods in the original area. This is somewhat similar to the functionality provided by segment_axis (http://projects.scipy.org/numpy/ticket/901), but I believe this slightly different in that neighborhood can extend to the left of the current index (assuming circular boundaries). Keep in mind that this is just an example, and normal usage probably is not concerned with creating a new array. External link: http://codepad.org/czRIzXQl -- import numpy as np cimport numpy as np cdef extern from numpy/arrayobject.h: ctypedef extern class numpy.flatiter [object PyArrayIterObject]: cdef int nd_m1 cdef np.npy_intp index, size cdef np.ndarray ao cdef char *dataptr # This isn't exposed to the Python API. # So we can't use the same approach we used to define flatiter ctypedef struct PyArrayNeighborhoodIterObject: int nd_m1 np.npy_intp index, size np.PyArrayObject *ao # note the change from np.ndarray char *dataptr object PyArray_NeighborhoodIterNew(flatiter it, np.npy_intp* bounds, int mode, np.ndarray fill_value) int PyArrayNeighborhoodIter_Next(PyArrayNeighborhoodIterObject *it) int PyArrayNeighborhoodIter_Reset(PyArrayNeighborhoodIterObject *it) object PyArray_IterNew(object arr) void PyArray_ITER_NEXT(flatiter it) np.npy_intp PyArray_SIZE(np.ndarray arr) cdef enum: NPY_NEIGHBORHOOD_ITER_ZERO_PADDING, NPY_NEIGHBORHOOD_ITER_ONE_PADDING, NPY_NEIGHBORHOOD_ITER_CONSTANT_PADDING, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, NPY_NEIGHBORHOOD_ITER_MIRROR_PADDING np.import_array() def windowed(np.ndarray[np.int_t, ndim=1] arr, bounds): cdef flatiter iterx = flatiterPyArray_IterNew(objectarr) cdef np.npy_intp size = PyArray_SIZE(arr) cdef np.npy_intp* boundsPtr = [bounds[0], bounds[1]] cdef int hoodSize = bounds[1] - bounds[0] + 1 # Create the Python object and keep a reference to it cdef object niterx_ = PyArray_NeighborhoodIterNew(iterx, boundsPtr, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, None) cdef PyArrayNeighborhoodIterObject *niterx = \ PyArrayNeighborhoodIterObject *niterx_ cdef int i,j cdef np.ndarray[np.int_t, ndim=2] hoods hoods = np.empty((arr.shape[0], hoodSize), dtype=np.int) for i in range(iterx.size): for j in range(niterx.size): hoods[i,j] = (niterx.dataptr)[0] PyArrayNeighborhoodIter_Next(niterx) PyArray_ITER_NEXT(iterx) PyArrayNeighborhoodIter_Reset(niterx) return hoods def test(): x = np.arange(10) print x print print windowed(x, [-1, 3]) print print windowed(x, [-2, 2]) -- If you run test(), this is what you should see: [0 1 2 3 4 5 6 7 8 9] [[9 0 1 2 3] [0 1 2 3 4] [1 2 3 4 5] [2 3 4 5 6] [3 4 5 6 7] [4 5 6 7 8] [5 6 7 8 9] [6 7 8 9 0] [7 8 9 0 1] [8 9 0 1 2]] [[8 9 0 1 2] [9 0 1 2 3] [0 1 2 3 4] [1 2 3 4 5] [2 3 4 5 6] [3 4 5 6 7] [4 5 6 7 8] [5 6 7 8 9] [6 7 8 9 0] [7 8 9 0 1]] windowed(x, [0, 2]) is almost like segment_axis(x, 3, 2, end='wrap'). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Non-rectangular neighborhoods
While reading the documentation for the neighborhood iterator, it seems that it can only handle rectangular neighborhoods. Have I understood this correctly? If it is possible to do non-rectangular regions, could someone post an example/sketch of how to do this? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] random number generator, entropy and pickling
On Mon, Apr 25, 2011 at 9:57 AM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: We thought that we could simply have a PRNG per object, as in: def __init__(self, prng=None): if prng is None: prng = np.random.RandomState() self.prng = prng I don't like this option, because it means that with a given pieve of code, setting the seed of numpy's PRNG isn't enough to make it reproducible. I couldn't retrieve a handle on a picklable instance for the global PRNG. The only option I can see would be to use the global numpy PRNG to seed an instance specific RandomState, as in: def __init__(self, prng=None): if prng is None: prng = np.random.RandomState(np.random.random()) self.prng = prng That way seeding the global PRNG really does control the full random number generation. I am wondering if it would have an adverse consequence on the entropy of the stream of random numbers. Does anybody have suggestions? Advices? If code A relies on code B (eg, some numpy function) and code B changes, then the stream of random numbers will no longer be the same. The point here is that the user wrote code A but depended on code B, and even though code A was unchanged, their random numbers were not the same. The situation is improved if scikits.learn used its own global RandomState instance. Then code A will at least give the same stream of random numbers for a fixed version of scikits.learn. It should be made very clear though that the data stream cannot be expected to be the same across versions. As to each object having its own RandomState instance, I definitely see that it makes restoring the overall state of a piece of code harder, but perhaps utility functions could make this easier. I can imagine that users might want to arbitrarily set the seed for a particular object in the midsts of a larger piece of code. Perhaps the user is testing known failure cases of a algorithm A interacting with algorithm B. If the user wants to loop over known seeds which cause algorithm A to fail but needs algorithm B to keep its seed fixed, then it seems like having a global seed makes this more difficult. In that sense, it might be desirable to have independent prngs. On the other hand, maybe that is an uncommon use case that could be handled through manually setting the seed. Post back on what you guys decide. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] einsum
On Wed, Jan 26, 2011 at 5:02 PM, Joshua Holbrook josh.holbr...@gmail.com wrote: Ah, sorry for misunderstanding. That would actually be very difficult, as the iterator required a fair bit of fixes and adjustments to the core. The new_iterator branch should be 1.5 ABI compatible, if that helps. I see. Perhaps the fixes and adjustments can/should be included with numpy standard, even if the Einstein notation package is made a separate module. snip Indeed, I would like to be able to install and use einsum() without having to install another version of numpy. Even if it depends on features of a new numpy, it'd be nice to have it be a separate module. I don't really understand the desire to have this single function exist in a separate package. If it requires the new version of NumPy, then you'll have to install/upgrade either way...and if it comes as part of that new NumPy, then you are already set. Doesn't a separate package complicate things unnecessarily? It make sense to me if einsum consisted of many functions (such as Bottleneck). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Build failure at rev8246
Hi, I tried upgrading today and had trouble building numpy (after rm -rf build). My full build log is here: http://www.filedump.net/dumped/build1274454454.txt If someone can point me in the right direction, I'd appreciate it very much. To excerpts from the log file: Running from numpy source directory.numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch detected, the C API version numbers have to be updated. Current C api version is 4, with checksum 59750b518272c8987f02d66445afd3f1, but recorded checksum for C API version 4 in codegen_dir/cversions.txt is 3d8940bf7b0d2a4e25be4338c14c3c85. If functions were added in the C API, you have to update C_API_VERSION in numpy/core/setup_common.pyc. MismatchCAPIWarning) In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:36: numpy/core/src/multiarray/buffer.c: At top level: numpy/core/src/multiarray/buffer.c:715: error: conflicting types for ‘_descriptor_from_pep3118_format’ numpy/core/src/multiarray/common.c:221: note: previous implicit declaration of ‘_descriptor_from_pep3118_format’ was here numpy/core/src/multiarray/buffer.c: In function ‘_descriptor_from_pep3118_format’: numpy/core/src/multiarray/buffer.c:751: warning: assignment from incompatible pointer type In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:45: numpy/core/src/multiarray/multiarraymodule.c: In function ‘initmultiarray’: numpy/core/src/multiarray/multiarraymodule.c:3062: warning: implicit declaration of function ‘_numpymemoryview_init’ error: Command gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include -Ibuild/src.linux-i686-2.6/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/include/python2.6 -Ibuild/src.linux-i686-2.6/numpy/core/src/multiarray -Ibuild/src.linux-i686-2.6/numpy/core/src/umath -c numpy/core/src/multiarray/multiarraymodule_onefile.c -o build/temp.linux-i686-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o failed with exit status 1 Hope this helps. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Build failure at rev8246
On Fri, May 21, 2010 at 8:51 AM, Pauli Virtanen p...@iki.fi wrote: Fri, 21 May 2010 08:09:55 -0700, T J wrote: I tried upgrading today and had trouble building numpy (after rm -rf build). My full build log is here: http://www.filedump.net/dumped/build1274454454.txt Your SVN checkout might be corrupted, containing a mix of old and new files. Try building from a clean checkout. That was it! When I did an svn update, I noticed conflicts all over the place and accepted with tf but I guess that was not enough. Given that I haven't really touched this directory, what was the likely cause of this corruption? The last time I svn update'd was a few months ago. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] pareto docstring
On Mon, May 10, 2010 at 8:37 PM, josef.p...@gmail.com wrote: I went googling and found a new interpretation numpy.random.pareto is actually the Lomax distribution also known as Pareto 2, Pareto (II) or Pareto Second Kind distribution Great! So, from this it looks like numpy.random does not have a Pareto distribution, only Lomax, and the confusion came maybe because somewhere in the history the (II) (second kind) got dropped in the explanations. and actually it is in scipy.stats.distributions, but without rvs # LOMAX (Pareto of the second kind.) # Special case of Pareto of the first kind (location=-1.0) I understand the point with this last comment, but I think it can be confusing in that the Pareto (of the first kind) has no location parameter and people might think you are referring to the Generalized Pareto distribution. I think its much clearer to say: # Special case of the Pareto of the first kind, but shifted to the left by 1.x -- x + 1 2) Modify numpy/random/mtrand/distributions.c in the following way: double rk_pareto(rk_state *state, double a) { //return exp(rk_standard_exponential(state)/a) - 1; return 1.0 / rk_double(state)**(1.0 / a); } I'm not an expert on random number generator, but using the uniform distribution as in http://en.wikipedia.org/wiki/Pareto_distribution#Generating_a_random_sample_from_Pareto_distribution and your Devroy reference seems better, than based on the relationship to the exponential distribution http://en.wikipedia.org/wiki/Pareto_distribution#Relation_to_the_exponential_distribution Correct. The exp relationship was for the existing implementation (which corresponds to the Lomax). I commented that line out and just used 1/U^(1/a). I think without changing the source we can rewrite the docstring that this is Lomax (or Pareto of the Second Kind), so that at least the documentation is less misleading. But I find calling it Pareto very confusing, and I'm not the only one anymore, (and I don't know if anyone has used it assuming it is classical Pareto), so my preferred solution would be * rename numpy.random.pareto to numpy.random.lomax * and create a real (classical, first kind) pareto distribution (even though it's just adding or subtracting 1, ones we know it) I personally have used numpy.random.pareto thinking it was the Pareto distribution of the first kind---which led to this post in the first place. So, I'm in strong agreement. While doing this, perhaps we should increase functionality and allow users the ability to specify the scale of the distribution (instead of just the shape)? I can make a ticket for this and give a stab at creating the necessary patch. What's the backwards compatibility policy with very confusing names in numpy? It seems reasonable that we might have to follow the deprecation route, but I'd be happier with a faster fix. 1.5 - Provide numpy.random.lomax. Make numpy.random.pareto raise a DeprecationWarning and then call lomax. 2.0 (if there is no 1.6) - Make numpy.random.pareto behave as Pareto distribution of 1st kind. Immediately though, we can modify the docstring that is currently in there to make the situation clear, instructing users how they can generate samples from the standard Pareto distribution. This is the first patch I'll submit. Perhaps it is better to only change the docstring and then save all changes in functionality for 2.0. Deferring to others on this one... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] pareto docstring
On Sun, May 9, 2010 at 4:49 AM, josef.p...@gmail.com wrote: I think this is the same point, I was trying to make last year. Instead of renormalizing, my conclusion was the following, (copied from the mailinglist August last year) my conclusion: - What numpy.random.pareto actually produces, are random numbers from a pareto distribution with lower bound m=1, but location parameter loc=-1, that shifts the distribution to the left. To actually get useful random numbers (that are correct in the usual usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to add 1 to them. stats.distributions doesn't use mtrand.pareto rvs_pareto = 1 + numpy.random.pareto(a, size) I still have to work though the math of your argument, but maybe we can come to an agreement how the docstrings (or the function) should be changed, and what numpy.random.pareto really means. Josef (grateful, that there are another set of eyes on this) Yes, I think my renormalizing statement is incorrect as it is really just sampling from a different pdf altogether. See the following image: http://www.dumpt.com/img/viewer.php?file=q9tfk7ehxsw865vn067c.png It plots histograms of the various implementations against the pdfs. Summarizing: The NumPy implementation is based on (Devroye p. 262). The pdf listed there is: a / (1+x)^(a+1) This differs from the standard Pareto pdf: a / x^(a+1) It also differs from the pdf of the generalized Pareto distribution, with scale=1 and location=0: (1 + a x)^(-1/a - 1) And it also differs from the pdf of the generalized Pareto distribution with scale=1 and location=-1 or location=1. random.paretovariate and scipy.stats.pareto sample from the standard Pareto, and this is the desired behavior, IMO. Its true that 1 + np.random.pareto provides the fix, but I think we're better off changing the underlying implementation. Devroye has a more recent paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.8760 which states the Pareto distribution in the standard way. So I think it is safe to make this change. Backwards compatibility might be the only argument for not making this change. So here is my proposal: 1) Remove every mention of the generalized Pareto distribution from the docstring. As far as I can see, the generalized Pareto distribution does not reduce to the standard Pareto at all. We can still mention scipy.stats.distributions.genpareto and scipy.stats.distributions.pareto. The former is something different and the latter will (now) be equivalent to the NumPy function. 2) Modify numpy/random/mtrand/distributions.c in the following way: double rk_pareto(rk_state *state, double a) { //return exp(rk_standard_exponential(state)/a) - 1; return 1.0 / rk_double(state)**(1.0 / a); } Does this sound good? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] pareto docstring
The docstring for np.pareto says: This is a simplified version of the Generalized Pareto distribution (available in SciPy), with the scale set to one and the location set to zero. Most authors default the location to one. and also: The probability density for the Pareto distribution is .. math:: p(x) = \frac{am^a}{x^{a+1}} where :math:`a` is the shape and :math:`m` the location These two statements seem to be in contradiction. I think what was meant is that m is the scale, rather than the location. For if m were equal to zero, as the first portion of the docstring states, then the entire pdf would be zero for all shapes a0. Also, I'm not quite understanding how the stated pdf is actually the same as the pdf for the generalized pareto with the scale=1 and location=0. By the wikipedia definition of the generalized Pareto distribution, if we take \sigma=1 (scale equal to one) and \mu=0 (location equal to zero), then we get: (1 + a x)^(-1/a - 1) which is normalized over $x \in (0, \infty)$. If we compare this to the distribution stated in the docstring (with m=1) a x^{-a-1} we see that it is normalized over $x \in (1, \infty)$. And indeed, the distribution requires x scale = 1. If we integrate the generalized Pareto (with scale=1, location=0) over $x \in (1, \infty)$ then we have to re-normalize. So should the docstring say: This is a simplified version of the Generalized Pareto distribution (available in Scipy), with the scale set to one, the location set to zero, and the distribution re-normalized over the range (1, \infty). Most authors default the location to one. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Remove duplicate columns
Hi, Is there a way to sort the columns in an array? I need to sort it so that I can easily go through and keep only the unique columns. ndarray.sort(axis=1) doesn't do what I want as it destroys the relative ordering between the various columns. For example, I would like: [[2,1,3], [3,5,1], [0,3,1]] to go to: [[1,2,3], [5,3,1], [3,0,1]] (swap the first and second columns). So I want to treat the columns as objects and sort them. I can do this if I convert to a python list, but I was hoping to avoid doing that because I ultimately need to do element-wise bitwise operations. Thanks! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Remove duplicate columns
On Thu, May 6, 2010 at 10:36 AM, josef.p...@gmail.com wrote: there is a thread last august on unique rows which might be useful, and a thread in Dec 2008 for sorting rows something like np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) maybe it's np.unique with numpy 1.4. The thread is useful: http://www.mail-archive.com/numpy-discussion@scipy.org/msg19830.html I'll have to see if it is quicker for me to just do: y = x.transpose().tolist() y.sort() x = np.array(y).transpose() ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the value of nansum of nan's?
On Mon, Apr 26, 2010 at 10:03 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, We need to make a decision for ticket #1123 regarding what nansum should return when all values are nan. At some earlier point it was zero, but currently it is nan, in fact it is nan whatever the operation is. That is consistent, simple and serves to mark the array or axis as containing all nans. I would like to close the ticket and am a bit inclined to go with the current behaviour although there is an argument to be made for returning 0 for the nansum case. Thoughts? To add a bit of context, one could argue that the results should be consistent with the equivalent operations on empty arrays and always be non-nan. In [1]: nansum([]) Out[1]: nan In [2]: sum([]) Out[2]: 0.0 This seems like an obvious one to me. What is the spirit of nansum? Return the sum of array elements over a given axis treating Not a Numbers (NaNs) as zero. Okay. So NaNs in an array are treated as zeros and the sum is performed as one normally would perform it starting with an initial sum of zero. So if all values are NaN, then we add nothing to our original sum and still return 0. I'm not sure I understand the argument that it should return NaN. It is counter to the *purpose* of nansum. Also, if one wants to determine if all values in an array are NaN, isn't there another way? Let's keep (or make) those distinct operations, as they are definitely distinct concepts. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How do I ensure numpy headers are present in setup.py?
On Mon, Apr 5, 2010 at 11:28 AM, Robert Kern robert.k...@gmail.com wrote: On Mon, Apr 5, 2010 at 13:26, Erik Tollerud erik.tolle...@gmail.com wrote: Hmm, unfortunate. So the best approach then is probably just to tell people to install numpy first, then my package? Yup. And really, this isn't that unreasonable. Not only does this make users more aware of their environment (ie, the distinction between your package and the major numerical package in Python), but its so much cleaner. With the combined approach, any NumPy installation problems would be (frequently) associated with your package. On the other hand, if there are any NumPy installation problems in the separated approach, at least your package is blameless. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in logaddexp2.reduce
Hi, I'm getting some strange behavior with logaddexp2.reduce: from itertools import permutations import numpy as np x = np.array([-53.584962500721154, -1.5849625007211563, -0.5849625007211563]) for p in permutations([0,1,2]): print p, np.logaddexp2.reduce(x[list(p)]) Essentially, the result depends on the order of the array...and we get nans in the bad orders. Likely, this also affects logaddexp. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in logaddexp2.reduce
On Wed, Mar 31, 2010 at 10:30 AM, T J tjhn...@gmail.com wrote: Hi, I'm getting some strange behavior with logaddexp2.reduce: from itertools import permutations import numpy as np x = np.array([-53.584962500721154, -1.5849625007211563, -0.5849625007211563]) for p in permutations([0,1,2]): print p, np.logaddexp2.reduce(x[list(p)]) Essentially, the result depends on the order of the array...and we get nans in the bad orders. Likely, this also affects logaddexp. Sorry, forgot version information: $ python -c import numpy;print numpy.__version__ 1.5.0.dev8106 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in logaddexp2.reduce
On Wed, Mar 31, 2010 at 1:21 PM, Charles R Harris charlesr.har...@gmail.com wrote: Looks like roundoff error. So this is expected behavior? In [1]: np.logaddexp2(-1.5849625007211563, -53.584962500721154) Out[1]: -1.5849625007211561 In [2]: np.logaddexp2(-0.5849625007211563, -53.584962500721154) Out[2]: nan In [3]: np.log2(np.exp2(-0.5849625007211563) + np.exp2(-53.584962500721154)) Out[3]: -0.58496250072115608 Shouldn't the result at least behave nicely and just return the larger value? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in logaddexp2.reduce
On Wed, Mar 31, 2010 at 3:36 PM, Charles R Harris charlesr.har...@gmail.com wrote: So this is expected behavior? In [1]: np.logaddexp2(-1.5849625007211563, -53.584962500721154) Out[1]: -1.5849625007211561 In [2]: np.logaddexp2(-0.5849625007211563, -53.584962500721154) Out[2]: nan I don't see that In [1]: np.logaddexp2(-0.5849625007211563, -53.584962500721154) Out[1]: -0.58496250072115619 In [2]: np.logaddexp2(-1.5849625007211563, -53.584962500721154) Out[2]: -1.5849625007211561 What system are you running on. $ python --version Python 2.6.4 $ uname -a Linux localhost 2.6.31-20-generic-pae #58-Ubuntu SMP Fri Mar 12 06:25:51 UTC 2010 i686 GNU/Linux ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in logaddexp2.reduce
On Wed, Mar 31, 2010 at 3:38 PM, David Warde-Farley d...@cs.toronto.edu wrote: Unfortunately there's no good way of getting around order-of- operations-related rounding error using the reduce() machinery, that I know of. That seems reasonable, but receiving a nan, in this case, does not. Are my expectations are unreasonable? Would sorting the values before reducing be an ugly(-enough) solution? It seems to fix it in this particular case. Or is the method you posted in the link below a better solution? I'd like to see a 'logsumexp' and 'logsumexp2' in NumPy instead, using the generalized ufunc architecture, to do it over an arbitrary dimension of an array, rather than needing to invoke 'reduce' on logaddexp. I tried my hand at writing one but I couldn't manage to get it working for some reason, and I haven't had time to revisit it: http://mail.scipy.org/pipermail/numpy-discussion/2010-January/048067.html I saw this original post and was hoping someone would post a response. Maybe now... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in logaddexp2.reduce
On Wed, Mar 31, 2010 at 7:06 PM, Charles R Harris charlesr.har...@gmail.com wrote: That is a 32 bit kernel, right? Correct. Regarding the config.h, which config.h? I have a numpyconfig.h. Which compilation options should I obtain and how? When I run setup.py, I see: C compiler: gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/include -Ibuild/src.linux-i686-2.6/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/include/python2.6 -Ibuild/src.linux-i686-2.6/numpy/core/src/multiarray -Ibuild/src.linux-i686-2.6/numpy/core/src/umath -c' Here is my show_config(). atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib/sse2'] define_macros = [('ATLAS_INFO', '\\3.6.0\\')] language = c include_dirs = ['/usr/include'] atlas_blas_threads_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib/sse2/atlas', '/usr/lib/sse2'] define_macros = [('ATLAS_INFO', '\\3.6.0\\')] language = f77 include_dirs = ['/usr/include'] atlas_info: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib/sse2/atlas', '/usr/lib/sse2'] language = f77 include_dirs = ['/usr/include'] lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib/sse2'] language = c include_dirs = ['/usr/include'] mkl_info: NOT AVAILABLE $ gcc --version gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1 Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Uninformative Error
When passing in a list of longs and asking that the dtype be a float (yes, losing precision), the error message is uninformative whenever the long is larger than the largest float. x = 181626642333486640664316511479918087634811756599984861278481913634852446858952226941059178462566942027148832976486383692715763966132465634039844094073670028044755150133224694791817752891901042496950233943249209777416692569138779593594686170807571874640682826295728116325492852625325418526603207268018328608840 array(x, dtype=float) OverflowError: long int too large to convert to float array([x], dtype=float) ValueError: setting an array element with a sequence. The first error is informative, but the second is not and will occur anytime one tries to convert a python list containing longs which are too long. Is there a way this error message could be made more helpful? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Repeated dot products
Hi, Suppose I have an array of shape: (n, k, k). In this case, I have n k-by-k matrices. My goal is to compute the product of a (potentially large) user-specified selection (with replacement) of these matrices. For example, x = [0,1,2,1,3,3,2,1,3,2,1,5,3,2,3,5,2,5,3,2,1,3,5,6] says that I want to take the 0th matrix and dot it with the 1st matrix and dot that product with the 2nd matrix and dot that product with the 1st matrix again and so on... Essentially, I am looking for efficient ways of doing this. It seems like what I *need* is for dot to be a ufunc with a reduce() operator. Then I would construct an array of the matrices, as specified by the input x. For now, I am using a python loop and this is unbearable. prod = np.eye(k) for i in x: ... prod = dot(prod, matrices[i]) ... Is there a better way? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Row-wise dot product?
Is there a better way to achieve the following, perhaps without the python for loop? x.shape (1,3) y.shape (1,3) z = empty(len(x)) for i in range(1): ...z[i] = dot(x[i], y[i]) ... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Row-wise dot product?
On Mon, Sep 7, 2009 at 3:27 PM, T Jtjhn...@gmail.com wrote: On Mon, Sep 7, 2009 at 7:09 AM, Hans-Andreas Engeleng...@deshaw.com wrote: If you wish to avoid the extra memory allocation implied by `x*y' and get a ~4x speed-up, you can use a generalized ufunc (numpy = 1.3, stolen from the testcases): z = numpy.core.umath_tests.inner1d(x, y) This is exactly what I was hoping for. Now, I can also keep an array of vectors and apply a rotation matrix to each vector. I spoke too soon. inner1d will not allow me to rotate each row in the array. Is there another function that will help with this? If I'm understanding the signature for generalized ufuncs, it looks like I need: (i),(i,j)-(i) Or perhaps I am just being dense. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Row-wise dot product?
On Mon, Sep 7, 2009 at 3:43 PM, T Jtjhn...@gmail.com wrote: Or perhaps I am just being dense. Yes. I just tried to reinvent standard matrix multiplication. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Row-wise dot product?
On Mon, Sep 7, 2009 at 7:09 AM, Hans-Andreas Engeleng...@deshaw.com wrote: If you wish to avoid the extra memory allocation implied by `x*y' and get a ~4x speed-up, you can use a generalized ufunc (numpy = 1.3, stolen from the testcases): z = numpy.core.umath_tests.inner1d(x, y) This is exactly what I was hoping for. Now, I can also keep an array of vectors and apply a rotation matrix to each vector. Hopefully, these use cases show serve as good proof on why the generalized ufunc machinery is useful. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Specifying Index Programmatically
I have an array, and I need to index it like so: z[...,x,:] How can I write code which will index z, as above, when x is not known ahead of time. For that matter, the particular dimension I am querying is not known either. In case this is still confusing, I am looking for the NumPy way to do: z[(...,5,:)] z[(:, 3, :, 5, ...)] z[(1, ..., 5)] Basically, I want to be able to pre-construct what should appear inside the []. The numbers are no problem, but I'm having trouble with the ellipsis and colon. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reduce function of vectorize doesn't respect dtype?
On Fri, Aug 7, 2009 at 11:54 AM, T Jtjhn...@gmail.com wrote: The reduce function of ufunc of a vectorized function doesn't seem to respect the dtype. def a(x,y): return x+y b = vectorize(a) c = array([1,2]) b(c, c) # use once to populate b.ufunc d = b.ufunc.reduce(c) c.dtype, type(d) dtype('int32'), type 'int' c = array([[1,2,3],[4,5,6]]) b.ufunc.reduce(c) array([5, 7, 9], dtype=object) So is this a bug? Or am I doing something wrong? In the second example d = b.ufunc.reduce(c) type(d[0]) type 'int' d.dtype dtype('object') ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Specifying Index Programmatically
On Sat, Aug 8, 2009 at 8:54 PM, Neil Martinsen-Burrelln...@wartburg.edu wrote: The ellipsis is a built-in python constant called Ellipsis. The colon is a slice object, again a python built-in, called with None as an argument. So, z[...,2,:] == z[Ellipsis,2,slice(None)]. Very helpful! Thank you. I didn't run into this information in any of the indexing tutorials I ran through. If I get some time, I'll try to add it. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Indexing with a list...
z = array([1,2,3,4]) z[[1]] array([1]) z[(1,)] 1 I'm just curious: What is the motivation for this differing behavior? Is it a necessary consequence of, for example, the following: z[z3] array([1,2]) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Vectorize ufunc
I was wondering why vectorize doesn't make the ufunc available at the topmost level def a(x,y): return x + y b = vectorize(a) b.reduce Instead, the ufunc is stored at b.ufunc. Also, b.ufunc.reduce() doesn't seem to exist until I *use* the vectorized function at least once. Can this be changed so that it exists right away (and preferably at b.reduce instead of b.ufunc.reduce)? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] reduce function of vectorize doesn't respect dtype?
The reduce function of ufunc of a vectorized function doesn't seem to respect the dtype. def a(x,y): return x+y b = vectorize(a) c = array([1,2]) b(c, c) # use once to populate b.ufunc d = b.ufunc.reduce(c) c.dtype, type(d) dtype('int32'), type 'int' c = array([[1,2,3],[4,5,6]]) b.ufunc.reduce(c) array([5, 7, 9], dtype=object) My goal is to use the output of vectorize() as if it is actually a ufunc. So I'd really like to just type: b.reduce, b.accumulate, etc. And I don't want to have to write (after using it once to populate b.ufunc): b.ufunc.reduce(c).astype(numpy.int32) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] dot documentation
Hi, the documentation for dot says that a value error is raised if: If the last dimension of a is not the same size as the second-to-last dimension of b. (http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.htm) This doesn't appear to be the case: a = array([[1,2],[3,4]]) b = array([1,2]) dot(a,b) array([5,11]) I can see *how* 5,11 is obtained, but it seems this should have raised a ValueError since the 2 != 1. So the actual code must do something more involved. When I think about broadcasting, it seems that maybe b should have been broadcasted to: -- array([[1,2],[1,2]]) and then the multiplication done as normal (but this would give a 2x2 result). Can someone explain this to me? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dot documentation
Oh. b.shape = (2,). So I suppose the second to last dimension is, in fact, the last dimension...and 2 == 2. nvm On Fri, Aug 7, 2009 at 2:19 PM, T Jtjhn...@gmail.com wrote: Hi, the documentation for dot says that a value error is raised if: If the last dimension of a is not the same size as the second-to-last dimension of b. (http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.htm) This doesn't appear to be the case: a = array([[1,2],[3,4]]) b = array([1,2]) dot(a,b) array([5,11]) I can see *how* 5,11 is obtained, but it seems this should have raised a ValueError since the 2 != 1. So the actual code must do something more involved. When I think about broadcasting, it seems that maybe b should have been broadcasted to: -- array([[1,2],[1,2]]) and then the multiplication done as normal (but this would give a 2x2 result). Can someone explain this to me? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Repeated Dot Operations
Hi, Is there a good way to perform dot on an arbitrary list of arrays which avoids using a loop? Here is what I'd like to avoid: # m1, m2, m3 are arrays out = np.(m1.shape[0]) prod = [m1, m2, m3, m1, m2, m3, m3, m2] for m in prod: ... out = np.dot(out, m) ... I was hoping for something like np.dot.reduce(prod). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] python numpy code many times slower than c++
On Tue, Jan 20, 2009 at 6:57 PM, Neal Becker ndbeck...@gmail.com wrote: It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors. Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this). I'm not sure if I've understood everything here, but I think that pyublas provides exactly what you need. http://tiker.net/doc/pyublas/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New ufuncs
On Mon, Nov 10, 2008 at 4:05 PM, Charles R Harris [EMAIL PROTECTED] wrote: I added log2 and exp2. I still need to do the complex versions. I think logaddexp2 should go in also to compliment these. Same here, especially since logaddexp is present. Or was the idea that both logexpadd and logexpadd2 should be moved to scipy.special? Note that MPL also defines log2 and their version has slightly different properties, i.e., it returns integer values for integer powers of two. I'm just curious now. Can someone comment on the difference in the implementation just committed versus that in cephes? http://projects.scipy.org/scipy/scipy/browser/trunk/scipy/special/cephes/exp2.c The difference won't matter to me as far as usage goes, but I was curious. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New ufuncs
On Thu, Nov 6, 2008 at 3:01 PM, T J [EMAIL PROTECTED] wrote: On Thu, Nov 6, 2008 at 2:36 PM, Charles R Harris [EMAIL PROTECTED] wrote: I could add exp2, log2, and logaddexp2 pretty easily. Almost too easily, I don't want to clutter up numpy with a lot of functions. However, if there is a community for these functions I will put them in. I worry about clutter as well. Note that scipy provides log2 and exp2 already (scipy.special). So I think only logaddexp2 would be needed and (eventually) logdotexp2. Maybe scipy.special is a better place than in numpy? Then perhaps the clutter could be avoidedthough I'm probably not the best one to ask for advice on this. I will definitely use the functions and I suspect many others will as well---where ever they are placed. Since no one commented further on this, can we go ahead and add logaddexp2? Once in svn, we can always deal with 'location' later---I just don't want it to get forgotten. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] atlas not found, why?
On Fri, Nov 7, 2008 at 2:16 AM, David Cournapeau [EMAIL PROTECTED] wrote: And you have no site.cfg at all ? Wow. I was too focused on the current directory and didn't realize I had an old site.cfg in ~/. Two points: 1) Others (myself included) might catch such silliness sooner if the location of the cfg file were printed to screen during setup.py. As of now, I get: Running from numpy source directory. non-existing path in 'numpy/distutils': 'site.cfg' F2PY Version 2_5986 ... 2) The current system_info.py says: The file 'site.cfg' is looked for in 1) Directory of main setup.py file being run. 2) Home directory of user running the setup.py file as ~/.numpy-site.cfg 3) System wide directory (location of this file...) Doesn't this mean that it should *not* have picked up my ~/site.cfg? Anyway, I can now report that ATLAS is detected without defining any environment variables. Thanks for all the help! ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] atlas not found, why?
On Fri, Nov 7, 2008 at 1:58 AM, T J [EMAIL PROTECTED] wrote: That the fortran wrappers were compiled using g77 is also apparent via what is printed out during setup when ATLAS is detected: gcc -pthread _configtest.o -L/usr/lib/atlas -llapack -lblas -o _configtest ATLAS version 3.6.0 built by root on Fri Jan 9 15:57:20 UTC 2004: UNAME: Linux intech67 2.4.20 #1 SMP Fri Jan 10 18:29:51 EST 2003 i686 GNU/Linux INSTFLG : MMDEF: /fix/g/camm/atlas3-3.6.0/CONFIG/ARCHS/P4SSE2/gcc/gemm ARCHDEF : /fix/g/camm/atlas3-3.6.0/CONFIG/ARCHS/P4SSE2/gcc/misc F2CDEFS : -DAdd__ -DStringSunStyle CACHEEDGE: 1048576 F77 : /usr/bin/g77, version GNU Fortran (GCC) 3.3.3 20031229 (prerelease) (Debian) F77FLAGS : -fomit-frame-pointer -O CC : /usr/bin/gcc, version gcc (GCC) 3.3.3 20031229 (prerelease) (Debian) CC FLAGS : -fomit-frame-pointer -O3 -funroll-all-loops MCC : /usr/bin/gcc, version gcc (GCC) 3.3.3 20031229 (prerelease) (Debian) MCCFLAGS : -fomit-frame-pointer -O success! That was intended to be a question. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] atlas not found, why?
On Fri, Nov 7, 2008 at 1:48 AM, David Cournapeau [EMAIL PROTECTED] wrote: It works for me on Intrepid (64 bits). Did you install libatlas3gf-base-dev ? (the names changed in intrepid). I fear I am overlooking something obvious. $ sudo aptitude search libatlas p libatlas-3dnow-dev - Automatically Tuned Linear Algebra Software,3dnow static v libatlas-3gf.so - i libatlas-base-dev - Automatically Tuned Linear Algebra Software,generic static p libatlas-cpp-0.6-1 - The protocol library of the World Forge project - runtime libs p libatlas-cpp-0.6-1-dbg - The protocol library of the World Forge project - debugging libs p libatlas-cpp-0.6-dev- The protocol library of the World Forge project - header files p libatlas-cpp-doc- The protocol library of the World Forge project - documentation p libatlas-doc- Automatically Tuned Linear Algebra Software,documentation i A libatlas-headers- Automatically Tuned Linear Algebra Software,C header files p libatlas-sse-dev- Automatically Tuned Linear Algebra Software,SSE1 static i libatlas-sse2-dev - Automatically Tuned Linear Algebra Software,SSE2 static p libatlas-test - Automatically Tuned Linear Algebra Software,test programs v libatlas.so.3 - v libatlas.so.3gf - p libatlas3gf-3dnow - Automatically Tuned Linear Algebra Software,3dnow shared i A libatlas3gf-base- Automatically Tuned Linear Algebra Software,generic shared p libatlas3gf-sse - Automatically Tuned Linear Algebra Software,SSE1 shared i libatlas3gf-sse2- Automatically Tuned Linear Algebra Software,SSE2 shared It looks like I have the important ones: libatlas-base-dev libatlas-headers libatlas-sse2-dev libatlas3gf-base libatlas3gf-sse2 But I don't see libatlas3gf-base-dev anywhere. Is this on a special repository? My sources.list is: # Intrepid Final Release Repository deb http://archive.ubuntu.com/ubuntu intrepid main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu intrepid main restricted universe multiverse #Added by software-properties # Intrepid Security Updates deb http://archive.ubuntu.com/ubuntu intrepid-security main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu intrepid-security main restricted universe multiverse #Added by software-properties # Intrepid Bugfix Updates deb http://archive.ubuntu.com/ubuntu intrepid-updates main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu intrepid-updates main restricted universe multiverse #Added by software-properties I suppose it is possible that something is messed up on my end since I upgraded from 8.04. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] atlas not found, why?
On Fri, Nov 7, 2008 at 1:26 AM, David Cournapeau [EMAIL PROTECTED] wrote: David Cournapeau wrote: Ok, I took a brief look at this: I forgot that Ubuntu and Debian added an aditional library suffix to libraries depending on gfortran ABI. I added support for this in numpy.distutils - which was looking for libraries explicitly; could you retry *without* a site.cfg ? It should work, now, And it won't, because I am afraid the Ubuntu atlas package is broken in 8.10... They use the gfortran ABI, but they built the fortran wrappers with g77, according to ATL_buildinfo. https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/295051 I would strongly advise using atlas on Ubuntu 8.10 until this bug is solved: it means any numpy/scipy code using linear algebra is potentially broken (segfault, wrong results). Intended: '''I would strongly advise *against* using atlas on Ubuntu 8.10.''' :-) That the fortran wrappers were compiled using g77 is also apparent via what is printed out during setup when ATLAS is detected: gcc -pthread _configtest.o -L/usr/lib/atlas -llapack -lblas -o _configtest ATLAS version 3.6.0 built by root on Fri Jan 9 15:57:20 UTC 2004: UNAME: Linux intech67 2.4.20 #1 SMP Fri Jan 10 18:29:51 EST 2003 i686 GNU/Linux INSTFLG : MMDEF: /fix/g/camm/atlas3-3.6.0/CONFIG/ARCHS/P4SSE2/gcc/gemm ARCHDEF : /fix/g/camm/atlas3-3.6.0/CONFIG/ARCHS/P4SSE2/gcc/misc F2CDEFS : -DAdd__ -DStringSunStyle CACHEEDGE: 1048576 F77 : /usr/bin/g77, version GNU Fortran (GCC) 3.3.3 20031229 (prerelease) (Debian) F77FLAGS : -fomit-frame-pointer -O CC : /usr/bin/gcc, version gcc (GCC) 3.3.3 20031229 (prerelease) (Debian) CC FLAGS : -fomit-frame-pointer -O3 -funroll-all-loops MCC : /usr/bin/gcc, version gcc (GCC) 3.3.3 20031229 (prerelease) (Debian) MCCFLAGS : -fomit-frame-pointer -O success! And the problem with this, as you've mentioned before, is that g77 and gfortran are not abi compatible. But isn't this issue separate from the autodetection of atlas without a site.cfg? With r5986, atlas is still only detected if I declare ATLAS: $ ATLAS=/usr/lib python setup.py build versus $ unset ATLAS; python setup.py build ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New ufuncs
On Thu, Nov 6, 2008 at 1:48 PM, Charles R Harris [EMAIL PROTECTED] wrote: What is your particular interest in these other bases and why would they be better than working in base e and converting at the end? The interest is in information theory, where quantities are (standardly) represented in bits. So log2 quantities are often stored by the user and then passed into functions or classes. The main reason I'd like to shy away from conversions is that I also make use of generators/iterators and having next() convert to bits before each yield is not ideal (as these things are often slow enough and will be called many times). The only one I could see really having a fast implementation is log2. No disagreement here :) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New ufuncs
On Thu, Nov 6, 2008 at 2:36 PM, Charles R Harris [EMAIL PROTECTED] wrote: I could add exp2, log2, and logaddexp2 pretty easily. Almost too easily, I don't want to clutter up numpy with a lot of functions. However, if there is a community for these functions I will put them in. I worry about clutter as well. Note that scipy provides log2 and exp2 already (scipy.special). So I think only logaddexp2 would be needed and (eventually) logdotexp2. Maybe scipy.special is a better place than in numpy? Then perhaps the clutter could be avoidedthough I'm probably not the best one to ask for advice on this. I will definitely use the functions and I suspect many others will as well---where ever they are placed. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New ufuncs
On Wed, Nov 5, 2008 at 12:00 PM, Charles R Harris [EMAIL PROTECTED] wrote: Hmm I wonder if the base function should be renamed logaddexp, then logsumexp would apply to the reduce method. Thoughts? As David mentioned, logsumexp is probably the traditional name, but as the earlier link shows, it also goes by logadd. Given the distinction between add (a ufunc) and sum (something done over an axis) within numpy, it seems that logadd or logaddexp is probably a more fitting name. So long as it is documented, I doubt it matters much though... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New ufuncs
On Tue, Nov 4, 2008 at 9:37 PM, Anne Archibald [EMAIL PROTECTED] wrote: 2008/11/5 Charles R Harris [EMAIL PROTECTED]: Hi All, I'm thinking of adding some new ufuncs. Some possibilities are expadd(a,b) = exp(a) + exp(b) -- For numbers stored as logs: Surely this should be log(exp(a)+exp(b))? That would be extremely useful, yes. +1 But shouldn't it be called 'logadd', for adding values which are stored as logs? http://www.lri.fr/~pierres/donn%E9es/save/these/torch/docs/manual/logAdd.html I would also really enjoy a logdot function, to be used when working with arrays whose elements are log values. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] atlas not found, why?
Numpy doesn't seem to be finding my atlas install. Have I done something wrong or misunderstood? $ cd /usr/lib $ ls libatlas* libatlas.a libatlas.so libatlas.so.3gf libatlas.so.3gf.0 $ ls libf77* libf77blas.a libf77blas.so libf77blas.so.3gf libf77blas.so.3gf.0 $ ls libcblas* libcblas.a libcblas.so libcblas.so.3gf libcblas.so.3gf.0 $ ls liblapack* liblapack-3.a liblapack.aliblapack_atlas.so liblapack_atlas.so.3gf.0 liblapackgf-3.so liblapack.so.3gf liblapack-3.so liblapack_atlas.a liblapack_atlas.so.3gf liblapackgf-3.a liblapack.so liblapack.so.3gf.0 Since these are all in the standard locations, I am building without a site.cfg. Here is the beginning info: Running from numpy source directory. non-existing path in 'numpy/distutils': 'site.cfg' F2PY Version 2_5972 blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS NOT AVAILABLE atlas_blas_info: NOT AVAILABLE /tmp/numpy/numpy/distutils/system_info.py:1340: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) blas_info: libraries blas not found in /usr/local/lib FOUND: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 FOUND: libraries = ['blas'] library_dirs = ['/usr/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 lapack_opt_info: lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS numpy.distutils.system_info.atlas_threads_info NOT AVAILABLE atlas_info: numpy.distutils.system_info.atlas_info NOT AVAILABLE /tmp/numpy/numpy/distutils/system_info.py:1247: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) lapack_info: libraries lapack not found in /usr/local/lib FOUND: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 FOUND: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] atlas not found, why?
On Mon, Nov 3, 2008 at 10:46 AM, T J [EMAIL PROTECTED] wrote: Since these are all in the standard locations, I am building without a site.cfg. Here is the beginning info: Apparently, this is not enough. Only if I also set the ATLAS environment variable am I able to get this working as expected. So while I now have this working, I'd still like to understand why this is necessary.This is for Ubuntu 8.10. My previous experience was that no site.cfg was necessary. This is still the case, but you also need ATLAS defined. I was also expecting that the following site.cfg would avoid requiring ATLAS to be defined, but it did not. [DEFAULT] library_dirs = /usr/lib:/usr/lib/sse2 include_dirs = /usr/local/include [blas_opt] libraries = f77blas, cblas, atlas [lapack_opt] libraries = lapack, f77blas, cblas, atlas So can someone explain why I *must* define ATLAS. I tried a number of variations on site.cfg and could not get numpy to find atlas with any of them. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Installation Trouble
Sorrywrong list. On Sun, Nov 2, 2008 at 11:34 AM, T J [EMAIL PROTECTED] wrote: Hi, I'm having trouble installing PyUblas 0.93.1 (same problems from the current git repository). I'm in ubuntu 8.04 with standard boost packages (1.34.1, I believe). Do you have any suggestions? Thanks! ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combinations of objects (?)
On Mon, Oct 20, 2008 at 2:20 AM, A. G. wrote: one well attached to 2 or more units). Is there any simple way in numpy (scipy?) in which I can get the number of possible combinations of wells attached to the different 3 units, without repetitions? For example, I could have all 60 wells attached to unit number 1, then 59 on unit 1 and 1 on unit 3 and so on... From: http://tinyurl.com/6oeyx8 def boxings(n, k): seq, i = [n]*k + [0], k while i: yield tuple(seq[i] - seq[i+1] for i in xrange(k)) i = seq.index(0) - 1 seq[i:k] = [seq[i] - 1] * (k-i) Example: from scipy import factorial as f x = list(boxings(60,3)) len(x) 1891 f(60+3-1) / f(60) / f(3-1) 1891.2 That thread contains another solution using itertools. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SWIG, typemaps, 2D argout arrays
On Tue, Oct 14, 2008 at 1:02 AM, Sebastian Haase [EMAIL PROTECTED] wrote: b) I don't want to use Python / numpy API code in the C functions I'm wrapping - so I limit myself to input arrays! Since array memory does not distinguish between input or output (assuming there is no copying needed because of alignment or contiguity issues) the only implication of this is that I have to allocate the array outside the C functions. To clarify, are you saying that you don't make any use of numpy.i? The functions I'm writing are not making any explicit use of the Python/numpy API, but I'm sure it shows up in the *_wrap.c file that SWIG creates. input-conversion transparently for my --- do you need to handle non-contiguous arrays ? Probably not. HTH, I'm eager to learn here ;-) Thanks ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] SWIG, typemaps, 2D argout arrays
Hi, I'm new to using SWIG and my reading of numpy_swig.pdf tells me that the following typemap does not exist: (int* ARGOUT_ARRAY2, int DIM1, int DIM2) What is the recommended way to output a 2D array? It seems like I should use: (int* ARGOUT_ARRAY1, int DIM1) and then provide a python function which reshapes the 1D array? Is it correct that there will be insignificant performance disadvantages to this? Also, is there any way to do this in an automated fashion? My first thought is that I'd need to create this function outside of the python module that SWIG creates. Thanks! ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Test failures on 2.6
Hi, I'm getting a couple of test failures with Python 2.6, Numpy 1.2.0, Nose 0.10.4: nose version 0.10.4 ..FK.. .../share/home/me/usr/lib/python2.6/site-packages/numpy/lib/tests/test_io.py:68: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(c.readlines(), ./share/home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py:1315: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(store._mask, True) /home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py:1322: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(store._mask, True) /home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py:1989: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(test.mask, [0,1,0,0,0,0,0,0,0,0]) ...E == ERROR: Tests the min/max functions with explicit outputs -- Traceback (most recent call last): File /home/me/usr/lib/python2.6/site-packages/numpy/ma/tests/test_core.py, line 653, in test_minmax_funcs_with_output result = npfunc(xm,axis=0,out=nout) File /home/me/usr/lib/python2.6/site-packages/numpy/core/fromnumeric.py, line 1525, in amin return amin(axis, out) File /home/me/usr/lib/python2.6/site-packages/numpy/ma/core.py, line 2978, in min np.putmask(out, newmask, np.nan) ValueError: cannot convert float NaN to integer == FAIL: test_umath.TestComplexFunctions.test_against_cmath -- Traceback (most recent call last): File /home/me/usr/lib/python2.6/site-packages/nose-0.10.4-py2.6.egg/nose/case.py, line 182, in runTest self.test(*self.arg) File /home/me/usr/lib/python2.6/site-packages/numpy/core/tests/test_umath.py, line 268, in test_against_cmath assert abs(a - b) atol, %s %s: %s; cmath: %s%(fname,p,a,b) AssertionError: arcsin 2: (1.57079632679-1.31695789692j); cmath: (1.57079632679+1.31695789692j) -- Ran 1726 tests in 8.856s FAILED (KNOWNFAIL=1, errors=1, failures=1) nose.result.TextTestResult run=1726 errors=1 failures=1 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Log Arrays
Hi, For precision reasons, I almost always need to work with arrays whose elements are log values. My thought was that it would be really neat to have a 'logarray' class implemented in C or as a subclass of the standard array class. Here is a sample of how I'd like to work with these objects: x = array([-2,-2,-3], base=2) y = array([-1,-2,-inf], base=2) z = x + y z array([-0.415037499279, -1.0, -3]) z = x * y z array([-3, -4, -inf]) z[:2].sum() -2.41503749928 This would do a lot for the code I writeand some of the numerical stability issues would handled more appropriately. For example, the logsum function is frequently handled as: log(x + y) == log(x) + log(1 + exp(log(y) - log(x)) ) when log(x) log(y). So the goal of the above function is to return log(x + y) using only the logarithms of x and y. Does this sound like a good idea? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Log Arrays
On Thu, May 8, 2008 at 12:26 AM, T J [EMAIL PROTECTED] wrote: x = array([-2,-2,-3], base=2) y = array([-1,-2,-inf], base=2) z = x + y z array([-0.415037499279, -1.0, -3]) z = x * y z array([-3, -4, -inf]) z[:2].sum() -2.41503749928 Whoops s/array/logarray/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Log Arrays
On 5/8/08, Anne Archibald [EMAIL PROTECTED] wrote: Is logarray really the way to handle it, though? it seems like you could probably get away with providing a logsum ufunc that did the right thing. I mean, what operations does one want to do on logarrays? add - logsum subtract - ? multiply - add mean - logsum/N median - median exponentiate to recover normal-space values - exp str - ? That's about it, as far as my usage goes. Additionally, I would also benefit from: logdot logouter In addition to the elementwise operations, it would be nice to have logsum along an axis logprod along an axis cumlogsum cumlogprod Whether these are through additional ufuncs or through a subclass is not so much of an issue for me---either would be a huge improvement to the current situation. One benefit of a subclass, IMO, is that it maintains the feel of non-log arrays. That is, when I want to multiply to logarrays, I just do x*y, rather than x+ybut I can understand arguments that this might not be desirable. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion