Re: [Numpy-discussion] Type specific sorts: objects, structured arrays, and all that.
On Tue, Jul 10, 2012 at 3:37 AM, Robert Kern robert.k...@gmail.com wrote: On Tue, Jul 10, 2012 at 4:32 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, I've been adding type specific sorts for object and structured arrays. It seems that datetime64 and timedelta64 are also not supported. Is there any reason why those types should not be sorted as int64? You need special handling for NaTs to be consistent with how we deal with NaNs in floats. Not sure if this is an issue or not, but different datetime64 objects can be set for different units: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-units. A straight-out comparison of the values as int64 would likely drop the units, correct? On second thought, though, I guess all datetime64's in a numpy array would all have the same units, so it shouldn't matter, right? Just thinking aloud. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Looking for the most important bugs, documentation needs, etc.
On Tue, Jul 10, 2012 at 6:07 AM, Ralf Gommers ralf.gomm...@googlemail.comwrote: On Tue, Jul 10, 2012 at 11:36 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Tue, Jul 10, 2012 at 4:20 AM, Six Silberman silberman@gmail.comwrote: Hi all, Some colleagues and I are interested in contributing to numpy. We have a range of backgrounds -- I for example am new to contributing to open source software but have a (small) bit of background in scientific computation, while others have extensive experience contributing to open source projects. We've looked at the issue tracker and submitted a couple patches today but we would be interested to hear what active contributors to the project consider the most pressing, important, and/or interesting needs at the moment. I personally am quite interested in hearing about the most pressing documentation needs (including example code). As for important issues, I think many of them are related to the core of numpy. But there's some more isolated ones, which is probably better to get started. Here are some that are high on my list of things to fix/improve: - Numpy doesn't work well (or at all) on OS X 10.7 when built with llvm-gcc, which is the default compiler on that platform. With Clang it seems to work fine. Same for Scipy. http://projects.scipy.org/numpy/ticket/1951 - We don't have binary installers for Python 3.x on OS X yet. This requires adapting the installer build scripts that work for 2.x. See pavement.py in the base dir of the repo. - Something that's more straightforward: improving test coverage. It's lacking in a number of places; one of the things that comes to mind is that all functions should be tested for correct behavior with empty input. Normally the expected behavior is empty in -- empty out. When that's not tested, we get things like http://projects.scipy.org/numpy/ticket/2078. Ticket for empty test coverage: http://projects.scipy.org/numpy/ticket/2007 - There's a large amount of normal bugs, working on any of those would be very helpful too. Hard to say here which ones out of the several hundred are important. It is safe to say though I think that the ones requiring touching the C code are more in need of attention than the pure Python ones. I see a patch for f2py already, and a second ticket opened. This is of course useful, but not too many devs are working on it. Unless Pearu has time to respond this week, it may be hard to get feedback on that topic quickly. Here are some relatively straightforward issues which only require touching Python code: http://projects.scipy.org/numpy/ticket/808 http://projects.scipy.org/numpy/ticket/1968 http://projects.scipy.org/numpy/ticket/1976 http://projects.scipy.org/numpy/ticket/1989 And a Cython one (numpy.random): http://projects.scipy.org/numpy/ticket/1492 I ran into one more patch that I assume one of you just attached: http://projects.scipy.org/numpy/ticket/2074. It's important to understand a little of how our infrastructure works. We changed to git + github last year; submitting patches as pull requests on Github has the lowest overhead for us, and we get notifications. For patches on Trac, we have to manually download and apply them. Plus we don't get notifications, which is quite unhelpful unfortunately. Therefore I suggest using git, and if you can't or you feel that the overhead / learning curve is too large, please ping this mailing list about patches you submit on Trac. Cheers, Ralf By the way, for those who are looking to learn how to use git and github: https://github.com/blog/1183-try-git-in-your-browser Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] build numpy 1.6.2
On Tue, Jul 10, 2012 at 2:45 PM, Prakash Joshi pjo...@numenta.com wrote: Hi All, I built numpy 1.6.2 on linux 64 bit and installed numpy in site-packages, It pass all the test cases of numpy, but I am not sure if this is good build; As I did not specified any fortran compiler while setup, also I do not have fortran compiler on my machine. Thanks Prakash NumPy does not need Fortran for its build. SciPy, however, does. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] build numpy 1.6.2
Prakash, On Tue, Jul 10, 2012 at 3:26 PM, Prakash Joshi pjo...@numenta.com wrote: Thanks Ben. Also I did not specified any of BLAS, LAPACK, ATLAS libraries, do we need these libraries for numpy? Need, no, you do not need them in the sense that NumPy does not require them to work. NumPy will work just fine without those libraries. However, if you want them, then that is where the choice of Fortran compiler comes in. Look at the INSTALL.txt file for more detailed instructions. I simply used following command to build: python setup.py build python setup.py install —prefix=/usr/local If above commands are sufficient, than I hope same steps to build will work on Mac OSX? That entirely depends on your development setup on your Mac. I will leave that discussion up to others on the list to answer. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Remove current 1.7 branch?
On Thursday, July 12, 2012, Thouis (Ray) Jones wrote: On Thu, Jul 12, 2012 at 1:28 AM, Charles R Harris charlesr.har...@gmail.com javascript:; wrote: Hi All, Travis and I agree that it would be appropriate to remove the current 1.7.x branch and branch again after a code freeze. That way we can avoid the pain and potential errors of backports. It is considered bad form to mess with public repositories that way, so another option would be to rename the branch, although I'm not sure how well that would work. Suggestions? I might be mistaken, but if the branch is merged into master (even if that merge makes no changes), I think it's safe to delete it at that point (and recreate it at a later date with the same name) with regards to remote repositories. It should be fairly easy to test. Ray Jones No, that is not the case. We had a situation occur awhile back where one of the public branches of mpl got completely messed up. You can't even rename it since the rename doesn't occur in the pulls and merges. What we ended up doing was creating a brand new branch v1.0.x-maint and making sure all the devs knew to switch over to that. You might even go a step further and make a final commit to the bad branch that makes the build fail with a big note explaining what to do. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Remove current 1.7 branch?
On Thursday, July 12, 2012, Nathaniel Smith wrote: On Thu, Jul 12, 2012 at 12:48 PM, Benjamin Root ben.r...@ou.edujavascript:; wrote: On Thursday, July 12, 2012, Thouis (Ray) Jones wrote: On Thu, Jul 12, 2012 at 1:28 AM, Charles R Harris charlesr.har...@gmail.com javascript:; wrote: Hi All, Travis and I agree that it would be appropriate to remove the current 1.7.x branch and branch again after a code freeze. That way we can avoid the pain and potential errors of backports. It is considered bad form to mess with public repositories that way, so another option would be to rename the branch, although I'm not sure how well that would work. Suggestions? I might be mistaken, but if the branch is merged into master (even if that merge makes no changes), I think it's safe to delete it at that point (and recreate it at a later date with the same name) with regards to remote repositories. It should be fairly easy to test. Ray Jones No, that is not the case. We had a situation occur awhile back where one of the public branches of mpl got completely messed up. You can't even rename it since the rename doesn't occur in the pulls and merges. What we ended up doing was creating a brand new branch v1.0.x-maint and making sure all the devs knew to switch over to that. You might even go a step further and make a final commit to the bad branch that makes the build fail with a big note explaining what to do. The branch isn't bad, it's just out of date. So long as the new version of the branch has the current version of the branch in its ancestry, then everything will be fine. Option 1: git checkout master git merge maint1.7.x git checkout maint1.7.x git merge master # will be a fast-forward Option 2: git checkout master git merge maint1.7.x git branch -d maint1.7.x # delete the branch git checkout -b maint1.7.x # recreate it In git terms these two options are literally identical; they result in the exact same repo state... -N Ah, I misunderstood. Then yes, I think this is correct. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] use slicing as argument values?
On Thu, Jul 12, 2012 at 3:38 PM, Chao YUE chaoyue...@gmail.com wrote: Dear all, I want to create a function and I would like one of the arguments of the function to determine what slicing of numpy array I want to use. a simple example: a=np.arange(100).reshape(10,10) suppose I want to have a imaging function to show image of part of this data: def show_part_of_data(m,n): plt.imshow(a[m,n]) like I can give m=3:5, n=2:7, when I call function show_part_of_data(3:5,2:7), this means I try to do plt.imshow(a[3:5,2:7]). the above example doesn't work in reality. but it illustrates something similar that I desire, that is, I can specify what slicing of number array I want by giving values to function arguments. thanks a lot, Chao What you want to do is create slice objects. a[3:5] is equivalent to sl = slice(3, 5) a[sl] and a[3:5, 5:14] is equivalent to sl = (slice(3, 5), slice(5, 14)) a[sl] Furthermore, notation such as ::-1 is equivalent to slice(None, None, -1) I hope this helps! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] use slicing as argument values?
On Thu, Jul 12, 2012 at 4:46 PM, Chao YUE chaoyue...@gmail.com wrote: Hi Ben, it helps a lot. I am nearly finishing a function in a way I think pythonic. Just one more question, I have: In [24]: b=np.arange(1,11) In [25]: b Out[25]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) In [26]: b[slice(1)] Out[26]: array([1]) In [27]: b[slice(4)] Out[27]: array([1, 2, 3, 4]) In [28]: b[slice(None,4)] Out[28]: array([1, 2, 3, 4]) so slice(4) is actually slice(None,4), how can I exactly want retrieve a[4] using slice object? thanks again! Chao Tricky question. Note the difference between a[4] and a[4:5] The first returns a scalar, while the second returns an array. The first, though, is not a slice, just an integer. Also, note that the arguments for slice() behaves very similar to the arguments for range() (with some exceptions/differences). Cheers! Ben Root 2012/7/12 Benjamin Root ben.r...@ou.edu On Thu, Jul 12, 2012 at 3:38 PM, Chao YUE chaoyue...@gmail.com wrote: Dear all, I want to create a function and I would like one of the arguments of the function to determine what slicing of numpy array I want to use. a simple example: a=np.arange(100).reshape(10,10) suppose I want to have a imaging function to show image of part of this data: def show_part_of_data(m,n): plt.imshow(a[m,n]) like I can give m=3:5, n=2:7, when I call function show_part_of_data(3:5,2:7), this means I try to do plt.imshow(a[3:5,2:7]). the above example doesn't work in reality. but it illustrates something similar that I desire, that is, I can specify what slicing of number array I want by giving values to function arguments. thanks a lot, Chao What you want to do is create slice objects. a[3:5] is equivalent to sl = slice(3, 5) a[sl] and a[3:5, 5:14] is equivalent to sl = (slice(3, 5), slice(5, 14)) a[sl] Furthermore, notation such as ::-1 is equivalent to slice(None, None, -1) I hope this helps! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] use slicing as argument values?
On Thursday, July 12, 2012, Chao YUE wrote: Thanks all for the discussion. Actually I am trying to use something like numpy ndarray indexing in the function. Like when I call: func(a,'1:3,:,2:4'), it knows I want to retrieve a[1:3,:,2:4], and func(a,'1:3,:,4') for a[1:3,:,4] ect. I am very close now. #so this function changes the string to list of slice objects. def convert_string_to_slice(slice_string): provide slice_string as '2:3,:', it will return [slice(2, 3, None), slice(None, None, None)] slice_list=[] split_slice_string_list=slice_string.split(',') for sub_slice_string in split_slice_string_list: split_sub=sub_slice_string.split(':') if len(split_sub)==1: sub_slice=slice(int(split_sub[0])) else: if split_sub[0]=='': sub1=None else: sub1=int(split_sub[0]) if split_sub[1]=='': sub2=None else: sub2=int(split_sub[1]) sub_slice=slice(sub1,sub2) slice_list.append(sub_slice) return slice_list In [119]: a=np.arange(3*4*5).reshape(3,4,5) for this it works fine. In [120]: convert_string_to_slice('1:3,:,2:4') Out[120]: [slice(1, 3, None), slice(None, None, None), slice(2, 4, None)] In [121]: a[slice(1, 3, None), slice(None, None, None), slice(2, 4, None)]==a[1:3,:,2:4] Out[121]: array([[[ True, True], [ True, True], [ True, True], [ True, True]], [[ True, True], [ True, True], [ True, True], [ True, True]]], dtype=bool) And problems happens when I want to retrieve a single number along a given dimension: because it treats 1:3,:,4 as 1:3,:,:4, as shown below: In [122]: convert_string_to_slice('1:3,:,4') Out[122]: [slice(1, 3, None), slice(None, None, None), slice(None, 4, None)] In [123]: a[1:3,:,4] Out[123]: array([[24, 29, 34, 39], [44, 49, 54, 59]]) In [124]: a[slice(1, 3, None), slice(None, None, None), slice(None, 4, None)] Out[124]: array([[[20, 21, 22, 23], [25, 26, 27, 28], [30, 31, 32, 33], [35, 36, 37, 38]], [[40, 41, 42, 43], [45, 46, 47, 48], [50, 51, 52, 53], [55, 56, 57, 58]]]) Then I have a function: #this function retrieves data from ndarray a by specifying slice_string: def retrieve_data(a,slice_string): slice_list=convert_string_to_slice(slice_string) return a[*slice_list] In the list line of the fuction retrieve_data I have problem, I get an invalid syntax error. return a[*slice_list] ^ SyntaxError: invalid syntax I hope it's not too long, please comment as you like. Thanks a lot Chao I won't comment on the wisdom of your approach, but for you very last part, don't try unpacking the slice list. Also, I think it has to be a tuple, but I could be wrong on that. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.complex
On Monday, July 23, 2012, OC wrote: It's unPythonic just in the sense that it is unlike every other type constructor in Python. int(x) returns an int, list(x) returns a list, but np.complex64(x) sometimes returns a np.complex64, and sometimes it returns a np.ndarray, depending on what 'x' is. This object factory design pattern adds useful and natural functionality. I can see an argument for deprecating this behaviour altogether and referring people to the np.asarray(x, dtype=complex) form; that would be cleaner and reduce confusion. Don't know if it's worth it, but that's the only cleanup that I can see even being considered for these constructors. From my experience in teaching, I can tell that even beginners have no problem with the fact that complex128(1) returns a scalar and that complex128(r_[1]) returns an array. It seems to be pretty natural. Also, from the duck-typing point of view, both returned values are complex, i.e. provide 'real' and 'imag' attributes and 'conjugate()' method. On the contrary a real confusion is with numpy.complex acting differently than the other numpy.complex*. People do write from numpy import * Yeah, that's what I do very often in interactive ipython sessions. Other than this, people are warned often enough that this shouldn't be used in real programs. Don't be so sure of that. The pylab mode from matplotlib has been both a blessing and a curse. This mode is very popular and for many, it is all they need/want to know. While it has made the transition from other languages easier for many, the polluted namespace comes at a small cost. And it is only going to get worse when moving over to py3k where just about everything is a generator. __builtin__.any can handle generators, but np.any does not. Same goes for several other functions. Note, I do agree with you that the discrepancy needs to be fixed, I just am not sure which way. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Synonym standards
On Thu, Jul 26, 2012 at 4:45 PM, Colin J. Williams fn...@ncf.ca wrote: It seems that these standards have been adopted, which is good: The following import conventions are used throughout the NumPy source and documentation: import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt Source: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt Is there some similar standard for PyLab? Thanks, Colin W. Colin, Typically, with pylab mode of matplotlib, you do: from pylab import * This is essentially equivalent to: from numpy import * from matplotlib.pyplot import * Note that the pylab module is actually a part of matplotlib and is a shortcut to provide an environment that is very familiar to Matlab users. Converts are then encouraged to use the imports you mentioned in order to properly utilize python namespaces. I hope that helps! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Synonym standards
On Thu, Jul 26, 2012 at 7:12 PM, Robert Kern robert.k...@gmail.com wrote: On Fri, Jul 27, 2012 at 12:05 AM, Colin J. Williams cjwilliam...@gmail.com wrote: On 26/07/2012 4:57 PM, Benjamin Root wrote: On Thu, Jul 26, 2012 at 4:45 PM, Colin J. Williams fn...@ncf.ca wrote: It seems that these standards have been adopted, which is good: The following import conventions are used throughout the NumPy source and documentation: import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt Source: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt Is there some similar standard for PyLab? Thanks, Colin W. Colin, Typically, with pylab mode of matplotlib, you do: from pylab import * This is essentially equivalent to: from numpy import * from matplotlib.pyplot import * Note that the pylab module is actually a part of matplotlib and is a shortcut to provide an environment that is very familiar to Matlab users. Converts are then encouraged to use the imports you mentioned in order to properly utilize python namespaces. I hope that helps! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks Ben, I would prefer not to use: from xxx import *, because of the name pollution. The name convention that I copied above facilitates avoiding the pollution. In the same spirit, I've used: import pylab as plb But in that same spirit, using np and plt separately is preferred. Namespaces are one honking great idea -- let's do more of those! from http://www.python.org/dev/peps/pep-0020/ Absolutely correct. The namespace pollution is exactly why we encourage converts to move over from the pylab mode to separating out the numpy and pyplot namespaces. There are very subtle issues that arise when doing from pylab import * such as overriding the built-in any and all. The only real advantage of the pylab mode over separating out numpy and pyplot is conciseness, which many matlab users expect at first. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On Thu, Jul 26, 2012 at 2:33 PM, Phil Hodge ho...@stsci.edu wrote: On a Linux machine: uname -srvop Linux 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 GNU/Linux this example shows an apparent problem with the where function: Python 2.7.1 (r271:86832, Dec 21 2010, 11:19:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type help, copyright, credits or license for more information. import numpy as np print np.__version__ 1.5.1 net = np.zeros(3, dtype='f4') net[1] = 0.00458849 net[2] = 0.605202 max_net = net.max() test = np.where(net = 0., max_net, net) print test [ -2.23910537e-35 4.58848989e-03 6.05202019e-01] When I specified the dtype for net as 'f8', test[0] was 3.46244974e+68. It worked as expected (i.e. test[0] should be 0.605202) when I specified float(max_net) as the second argument to np.where. Phil Confirmed with version 1.7.0.dev-470c857 on a CentOS6 64-bit machine. Strange indeed. Breaking it down further: res = (net = 0.) print res [ True False False] np.where(res, max_net, net) array([ -2.23910537e-35, 4.58848989e-03, 6.05202019e-01], dtype=float32) Very Strange... Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On Fri, Jul 27, 2012 at 3:58 PM, Andreas Mueller amuel...@ais.uni-bonn.dewrote: Hi Everybody. The bug is that no error is raised, right? The docs say where(condition, [x, y]) x, y : array_like, optional Values from which to choose. `x` and `y` need to have the same shape as `condition` In the example you gave, x was a scalar. Cheers, Andy Hmm, that is incorrect, I believe. I have used a scalar before. Maybe it works because a scalar is broadcastable to the same shape as any other N-dim array? If so, then the wording of that docstring needs to be fixed. No, I think Christopher hit it on the head. For whatever reason, the endian-ness somewhere is not being respected and causes a byte-swapped version to show up. How that happens, though, is beyond me. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release
On Tue, Aug 21, 2012 at 12:24 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Hi, I'm pleased to announce the availability of the first beta release of NumPy 1.7.0b1. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.7.0b1/ Please test this release and report any issues on the numpy-discussion mailing list. The following problems are known and we'll work on fixing them before the final release: http://projects.scipy.org/numpy/ticket/2187 http://projects.scipy.org/numpy/ticket/2185 http://projects.scipy.org/numpy/ticket/2066 http://projects.scipy.org/numpy/ticket/1588 http://projects.scipy.org/numpy/ticket/2076 http://projects.scipy.org/numpy/ticket/2101 http://projects.scipy.org/numpy/ticket/2108 http://projects.scipy.org/numpy/ticket/2150 http://projects.scipy.org/numpy/ticket/2189 I would like to thank Ralf for a lot of help with creating binaries and other help for this release. Cheers, Ondrej At http://docs.scipy.org/doc/numpy/contents.html, it looks like the TOC tree is a bit messed up. For example, I see that masked arrays are listed multiple times, and I think some of the sub-entries for masked arrays show up multiple times within an entry for masked arrays. Some of the bullets appear as instead of dots. Don't know what version that page is generated from, but we might want to double-check that 1.7.0's docs don't have the same problem. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] broadcasting question
On Thursday, August 30, 2012, Neal Becker wrote: I think this should be simple, but I'm drawing a blank I have 2 2d matrixes Matrix A has indexes (i, symbol) Matrix B has indexes (state, symbol) I combined them into a 3d matrix: C = A[:,newaxis,:] + B[newaxis,:,:] where C has indexes (i, state, symbol) That works fine. Now suppose I want to omit B (for debug), like: C = A[:,newaxis,:] In other words, all I want is to add a dimension into A and force it to broadcast along that axis. How do I do that? np.tile would help you there, I think. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.ma.MaskedArray.min() makes a copy?
An issue just reported on the matplotlib-users list involved a user who ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here): obj.filled(inf).min() The idea is that any masked element is set to the largest possible value for their dtype in a copied array of itself, and then a min() is performed on that copied array. I am assuming that max() does the same thing. Can this be done differently/more efficiently? If the filled approach has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Regression: in-place operations (possibly intentional)
Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant tra...@continuum.iowrote: On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past. We did decide to change the default casting rule to same_kind but making an exception for in-place seems reasonable. I think that in these cases same_kind will flag what are most likely programming errors and sloppy code. It is easy to be explicit and doing so will make the code more readable because it will be immediately obvious what the multiplicand is without the need to recall what the numpy casting rules are in this exceptional case. IISTR several mentions of this before (Gael?), and in some of those cases it turned out that bugs were being turned up. Catching bugs with minimal effort is a good thing. Chuck True, it is quite likely to be a programming error, but then again, there are many cases where it isn't. Is the problem strictly that we are trying to downcast the float to an int, or is it that we are trying to downcast to a lower precision? Is there a way for one to explicitly relax the same_kind restriction? Thanks, Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.ma.MaskedArray.min() makes a copy?
On Fri, Sep 7, 2012 at 12:05 PM, Nathaniel Smith n...@pobox.com wrote: On 7 Sep 2012 14:38, Benjamin Root ben.r...@ou.edu wrote: An issue just reported on the matplotlib-users list involved a user who ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here): obj.filled(inf).min() The idea is that any masked element is set to the largest possible value for their dtype in a copied array of itself, and then a min() is performed on that copied array. I am assuming that max() does the same thing. Can this be done differently/more efficiently? If the filled approach has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year. I think what you're looking for is where= support for ufunc.reduce. This isn't implemented yet but at least it's straightforward in principle... otherwise I don't know anything better than reimplementing .min() by hand. -n Yes, it was the where= support that I was thinking of. I take it that it was pulled out of the 1.7 branch with the rest of the NA stuff? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
On Tue, Sep 18, 2012 at 3:19 PM, Ralf Gommers ralf.gomm...@gmail.comwrote: On Tue, Sep 18, 2012 at 9:13 PM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant tra...@continuum.iowrote: On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past. We did decide to change the default casting rule to same_kind but making an exception for in-place seems reasonable. I think that in these cases same_kind will flag what are most likely programming errors and sloppy code. It is easy to be explicit and doing so will make the code more readable because it will be immediately obvious what the multiplicand is without the need to recall what the numpy casting rules are in this exceptional case. IISTR several mentions of this before (Gael?), and in some of those cases it turned out that bugs were being turned up. Catching bugs with minimal effort is a good thing. Chuck True, it is quite likely to be a programming error, but then again, there are many cases where it isn't. Is the problem strictly that we are trying to downcast the float to an int, or is it that we are trying to downcast to a lower precision? Is there a way for one to explicitly relax the same_kind restriction? I think the problem is down casting across kinds, with the result that floats are truncated and the imaginary parts of imaginaries might be discarded. That is, the value, not just the precision, of the rhs changes. So I'd favor an explicit cast in code like this, i.e., cast the rhs to an integer. It is true that this forces downstream to code up to a higher standard, but I don't see that as a bad thing, especially if it exposes bugs. And it isn't difficult to fix. Chuck Mind you, in my case, casting the rhs as an integer before doing the multiplication would be a bug, since our value for the rhs is usually between zero and one. Multiplying first by the integer numerator before dividing by the integer denominator would likely cause issues with overflowing the 16 bit integer. Then you'd have to do a = np.array([1, 2, 3, 4, 5], dtype=np.int16) np.multiply(a, 0.5, out=a, casting=unsafe) array([0, 1, 1, 2, 2], dtype=int16) Ralf That is exactly what I am looking for! When did the casting kwarg come about? I am unfamiliar with it. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant tra...@continuum.iowrote: On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past. We did decide to change the default casting rule to same_kind but making an exception for in-place seems reasonable. I think that in these cases same_kind will flag what are most likely programming errors and sloppy code. It is easy to be explicit and doing so will make the code more readable because it will be immediately obvious what the multiplicand is without the need to recall what the numpy casting rules are in this exceptional case. IISTR several mentions of this before (Gael?), and in some of those cases it turned out that bugs were being turned up. Catching bugs with minimal effort is a good thing. Chuck True, it is quite likely to be a programming error, but then again, there are many cases where it isn't. Is the problem strictly that we are trying to downcast the float to an int, or is it that we are trying to downcast to a lower precision? Is there a way for one to explicitly relax the same_kind restriction? I think the problem is down casting across kinds, with the result that floats are truncated and the imaginary parts of imaginaries might be discarded. That is, the value, not just the precision, of the rhs changes. So I'd favor an explicit cast in code like this, i.e., cast the rhs to an integer. It is true that this forces downstream to code up to a higher standard, but I don't see that as a bad thing, especially if it exposes bugs. And it isn't difficult to fix. Chuck Mind you, in my case, casting the rhs as an integer before doing the multiplication would be a bug, since our value for the rhs is usually between zero and one. Multiplying first by the integer numerator before dividing by the integer denominator would likely cause issues with overflowing the 16 bit integer. For the case in point I'd do In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) In [2]: a //= 2 In [3]: a Out[3]: array([0, 1, 1, 2, 2], dtype=int16) Although I expect you would want something different in practice. But the current code already looks fragile to me and I think it is a good thing you are taking a closer look at it. If you really intend going through a float, then it should be something like a = (a*(float(128)/256)).astype(int16) Chuck And thereby losing the memory benefit of an in-place multiplication? That is sort of the point of all this. We are using 16 bit integers because we wanted to be as efficient as possible and didn't need anything larger. Note, that is what we changed the code to, I am just wondering if we are being too cautious. The casting kwarg looks to be what I might want, though it isn't as clean as just writing an *= statement. Ben Root
Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)
On Tue, Sep 18, 2012 at 4:42 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 2:33 PM, Travis Oliphant tra...@continuum.iowrote: On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote: On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root ben.r...@ou.eduwrote: On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant tra...@continuum.io wrote: On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote: Consider the following code: import numpy as np a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(255) / 15 In v1.6.x, this yields: array([17, 34, 51, 68, 85], dtype=int16) But in master, this throws an exception about failing to cast via same_kind. Note that numpy was smart about this operation before, consider: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) a *= float(128) / 256 yields: array([0, 1, 1, 2, 2], dtype=int16) Of course, this is different than if one does it in a non-in-place manner: np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5 which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place? Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy. I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past. We did decide to change the default casting rule to same_kind but making an exception for in-place seems reasonable. I think that in these cases same_kind will flag what are most likely programming errors and sloppy code. It is easy to be explicit and doing so will make the code more readable because it will be immediately obvious what the multiplicand is without the need to recall what the numpy casting rules are in this exceptional case. IISTR several mentions of this before (Gael?), and in some of those cases it turned out that bugs were being turned up. Catching bugs with minimal effort is a good thing. Chuck True, it is quite likely to be a programming error, but then again, there are many cases where it isn't. Is the problem strictly that we are trying to downcast the float to an int, or is it that we are trying to downcast to a lower precision? Is there a way for one to explicitly relax the same_kind restriction? I think the problem is down casting across kinds, with the result that floats are truncated and the imaginary parts of imaginaries might be discarded. That is, the value, not just the precision, of the rhs changes. So I'd favor an explicit cast in code like this, i.e., cast the rhs to an integer. It is true that this forces downstream to code up to a higher standard, but I don't see that as a bad thing, especially if it exposes bugs. And it isn't difficult to fix. Chuck Mind you, in my case, casting the rhs as an integer before doing the multiplication would be a bug, since our value for the rhs is usually between zero and one. Multiplying first by the integer numerator before dividing by the integer denominator would likely cause issues with overflowing the 16 bit integer. For the case in point I'd do In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16) In [2]: a //= 2 In [3]: a Out[3]: array([0, 1, 1, 2, 2], dtype=int16) Although I expect you would want something different in practice. But the current code already looks fragile to me and I think it is a good thing you are taking a closer look at it. If you really intend going through a float, then it should be something like a = (a*(float(128)/256)).astype(int16) Chuck And thereby losing the memory benefit of an in-place multiplication? What makes you think you are getting that? I'd have
Re: [Numpy-discussion] specifying numpy as dependency in your project, install_requires
On Fri, Sep 21, 2012 at 4:19 PM, Travis Oliphant tra...@continuum.iowrote: On Sep 21, 2012, at 3:13 PM, Ralf Gommers wrote: Hi, An issue I keep running into is that packages use: install_requires = [numpy] or install_requires = ['numpy = 1.6'] in their setup.py. This simply doesn't work a lot of the time. I actually filed a bug against patsy for that ( https://github.com/pydata/patsy/issues/5), but Nathaniel is right that it would be better to bring it up on this list. The problem is that if you use pip, it doesn't detect numpy (may work better if you had installed numpy with setuptools) and tries to automatically install or upgrade numpy. That won't work if users don't have the right compiler. Just as bad would be that it does work, and the user didn't want to upgrade for whatever reason. This isn't just my problem; at Wes' pandas tutorial at EuroScipy I saw other people have the exact same problem. My recommendation would be to not use install_requires for numpy, but simply do something like this in setup.py: try: import numpy except ImportError: raise ImportError(my_package requires numpy) or try: from numpy.version import short_version as npversion except ImportError: raise ImportError(my_package requires numpy) if npversion '1.6': raise ImportError(Numpy version is %s; required is version = 1.6 % npversion) Any objections, better ideas? Is there a good place to put it in the numpy docs somewhere? I agree. I would recommend against using install requires. -Travis Why? I have personally never had an issue with this. The only way I could imagine that this wouldn't work is if numpy was installed via some other means and there wasn't an entry in the easy-install.pth (or whatever equivalent pip uses). If pip is having a problem detecting numpy, then that is a bug that needs fixing somewhere. As for packages getting updated unintentionally, easy_install and pip both require an argument to upgrade any existing packages (I think -U), so I am not sure how you are running into such a situation. I have found install_requires to be a powerful feature in my setup.py scripts, and I have seen no reason to discourage it. Perhaps I am the only one? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to pipe into numpy arrays?
On Wed, Oct 24, 2012 at 3:00 PM, Michael Aye kmichael@gmail.com wrote: As numpy.fromfile seems to require full file object functionalities like seek, I can not use it with the sys.stdin pipe. So how could I stream a binary pipe directly into numpy? I can imagine storing the data in a string and use StringIO but the files are 3.6 GB large, just the binary, and that will most likely be much more as a string object. Reading binary files on disk is NOT the problem, I would like to avoid the temporary file if possible. I haven't tried this myself, but there is a numpy.frombuffer() function as well. Maybe that could be used here? Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Regression in mpl: AttributeError: incompatible shape for a non-contiguous array
This error started showing up in the test suite for mpl when using numpy master. AttributeError: incompatible shape for a non-contiguous array The tracebacks all point back to various code points where we are trying to set the shape of an array, e.g., offsets.shape = (-1, 2) Those lines haven't changed in a couple of years, and was intended to be done this way to raise an error when reshaping would result in a copy (since we needed to use the original in those places). I don't know how these arrays have become non-contiguous, so I am wondering if there was some sort of attribute that got screwed up somewhere (maybe with views?) Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression in mpl: AttributeError: incompatible shape for a non-contiguous array
On Mon, Oct 29, 2012 at 10:33 AM, Sebastian Berg sebast...@sipsolutions.net wrote: Hey, On Mon, 2012-10-29 at 09:54 -0400, Benjamin Root wrote: This error started showing up in the test suite for mpl when using numpy master. AttributeError: incompatible shape for a non-contiguous array The tracebacks all point back to various code points where we are trying to set the shape of an array, e.g., offsets.shape = (-1, 2) Could you give a hint what these arrays history (how it was created) and maybe .shape/.strides is? Sounds like the array is not contiguous when it is expected to be, or the attribute setting itself fails in some corner cases on master? Regards, Sebastian The original reporter of the bug dug into the commit list and suspects it was this one: https://github.com/numpy/numpy/commit/02ebf8b3e7674a6b8a06636feaa6c761fcdf4e2d However, it might be earlier than that (he is currently doing a clean rebuild to make sure). As for the history: offsets = np.asanyarray(offsets) offsets.shape = (-1, 2) # Make it Nx2 Where offsets comes in from (possibly) user-supplied data. Nothing really all that special. I will see if I can get stride information. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression in mpl: AttributeError: incompatible shape for a non-contiguous array
On Mon, Oct 29, 2012 at 11:04 AM, Patrick Marsh patrickmars...@gmail.comwrote: Turns out it isn't the commit I thought it was. I'm currently going through a git bisect to track down the actual commit that introduced this bug. I'll post back when I've found it. PTM --- Patrick Marsh Ph.D. Candidate / Liaison to the HWT School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com On Mon, Oct 29, 2012 at 9:43 AM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Oct 29, 2012 at 10:33 AM, Sebastian Berg sebast...@sipsolutions.net wrote: Hey, On Mon, 2012-10-29 at 09:54 -0400, Benjamin Root wrote: This error started showing up in the test suite for mpl when using numpy master. AttributeError: incompatible shape for a non-contiguous array The tracebacks all point back to various code points where we are trying to set the shape of an array, e.g., offsets.shape = (-1, 2) Could you give a hint what these arrays history (how it was created) and maybe .shape/.strides is? Sounds like the array is not contiguous when it is expected to be, or the attribute setting itself fails in some corner cases on master? Regards, Sebastian The original reporter of the bug dug into the commit list and suspects it was this one: https://github.com/numpy/numpy/commit/02ebf8b3e7674a6b8a06636feaa6c761fcdf4e2d However, it might be earlier than that (he is currently doing a clean rebuild to make sure). As for the history: offsets = np.asanyarray(offsets) offsets.shape = (-1, 2) # Make it Nx2 Where offsets comes in from (possibly) user-supplied data. Nothing really all that special. I will see if I can get stride information. Ben Root Further digging reveals that the code fails when the array is originally 1-D. I had an array with shape (2,) and stride (8,). The reshaping should result in a shape of (1, 2). Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple question about scatter plot graph
On Wednesday, October 31, 2012, wrote: On Wed, Oct 31, 2012 at 8:59 PM, klo uo klo...@gmail.com javascript:; wrote: Thanks for your reply I suppose, variable length signals are split on equal parts and dominant harmonic is extracted. Then scatter plot shows this pattern, which has some low correlation, but I can't abstract what could be concluded from grid pattern, as I lack statistical knowledge. Maybe it's saying that data is quantized, which can't be easily seen from single sample bar chart, but perhaps scatter plot suggests that? That's only my wild guess http://pandasplotting.blogspot.ca/2012/06/lag-plot.html In general you would see a lag autocorrelation structure in the plot. My guess is that even if there is a pattern in your data we might not see it because we don't see plots that are plotted on top of each other. We only see the support of the y_t, y_{t+1} transition (points that are at least once in the sample), but not the frequencies (or conditional distribution). If that's the case, then reduce alpha level so many points on top of each other are darker, or colorcode the histogram for each y_t: bincount for each y_t and normalize, or use np.histogram directly for each y_t, then assign to each point a colorscale depending on it's frequency. Did you calculate the correlation? (But maybe linear correlation won't show much.) Josef The answer is hexbin() in matplotlib when you have many points laying on or near each other. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On Monday, November 12, 2012, Olivier Delalleau wrote: 2012/11/12 Nathaniel Smith n...@pobox.com javascript:_e({}, 'cvml', 'n...@pobox.com'); On Mon, Nov 12, 2012 at 8:54 PM, Matthew Brett matthew.br...@gmail.comjavascript:_e({}, 'cvml', 'matthew.br...@gmail.com'); wrote: Hi, I wanted to check that everyone knows about and is happy with the scalar casting changes from 1.6.0. Specifically, the rules for (array, scalar) casting have changed such that the resulting dtype depends on the _value_ of the scalar. Mark W has documented these changes here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.promote_types.html Specifically, as of 1.6.0: In [19]: arr = np.array([1.], dtype=np.float32) In [20]: (arr + (2**16-1)).dtype Out[20]: dtype('float32') In [21]: (arr + (2**16)).dtype Out[21]: dtype('float64') In [25]: arr = np.array([1.], dtype=np.int8) In [26]: (arr + 127).dtype Out[26]: dtype('int8') In [27]: (arr + 128).dtype Out[27]: dtype('int16') There's discussion about the changes here: http://mail.scipy.org/pipermail/numpy-discussion/2011-September/058563.html http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060381.html It seems to me that this change is hard to explain, and does what you want only some of the time, making it a false friend. The old behaviour was that in these cases, the scalar was always cast to the type of the array, right? So np.array([1], dtype=np.int8) + 256 returned 1? Is that the behaviour you prefer? I agree that the 1.6 behaviour is surprising and somewhat inconsistent. There are many places where you can get an overflow in numpy, and in all the other cases we just let the overflow happen. And in fact you can still get an overflow with arr + scalar operations, so this doesn't really fix anything. I find the specific handling of unsigned - signed and float32 - float64 upcasting confusing as well. (Sure, 2**16 isn't exactly representable as a float32, but it doesn't *overflow*, it just gives you 2.0**16... if I'm using float32 then I presumably don't care that much about exact representability, so it's surprising that numpy is working to enforce it, and definitely a separate decision from what to do about overflow.) None of those threads seem to really get into the question of what the best behaviour here *is*, though. Possibly the most defensible choice is to treat ufunc(arr, scalar) operations as performing an implicit cast of the scalar to arr's dtype, and using the standard implicit casting rules -- which I think means, raising an error if !can_cast(scalar, arr.dtype, casting=safe) I like this suggestion. It may break some existing code, but I think it'd be for the best. The current behavior can be very confusing. -=- Olivier break some existing code I really should set up an email filter for this phrase and have it send back an email automatically: Are you nuts?! We just resolved an issue where the safe casting rule unexpectedly broke existing code with regards to unplaced operations. The solution was to warn about the change in the upcoming release and to throw errors in a later release. Playing around with fundemental things like this need to be done methodically and carefully. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On Monday, November 12, 2012, Benjamin Root wrote: On Monday, November 12, 2012, Olivier Delalleau wrote: 2012/11/12 Nathaniel Smith n...@pobox.com On Mon, Nov 12, 2012 at 8:54 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I wanted to check that everyone knows about and is happy with the scalar casting changes from 1.6.0. Specifically, the rules for (array, scalar) casting have changed such that the resulting dtype depends on the _value_ of the scalar. Mark W has documented these changes here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.promote_types.html Specifically, as of 1.6.0: In [19]: arr = np.array([1.], dtype=np.float32) In [20]: (arr + (2**16-1)).dtype Out[20]: dtype('float32') In [21]: (arr + (2**16)).dtype Out[21]: dtype('float64') In [25]: arr = np.array([1.], dtype=np.int8) In [26]: (arr + 127).dtype Out[26]: dtype('int8') In [27]: (arr + 128).dtype Out[27]: dtype('int16') There's discussion about the changes here: http://mail.scipy.org/pipermail/numpy-discussion/2011-September/058563.html http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060381.html It seems to me that this change is hard to explain, and does what you want only some of the time, making it a false friend. The old behaviour was that in these cases, the scalar was always cast to the type of the array, right? So np.array([1], dtype=np.int8) + 256 returned 1? Is that the behaviour you prefer? I agree that the 1.6 behaviour is surprising and somewhat inconsistent. There are many places where you can get an overflow in numpy, and in all the other cases we just let the overflow happen. And in fact you can still get an overflow with arr + scalar operations, so this doesn't really fix anything. I find the specific handling of unsigned - signed and float32 - float64 upcasting confusing as well. (Sure, 2**16 isn't exactly representable as a float32, but it doesn't *overflow*, it just gives you 2.0**16... if I'm using float32 then I presumably don't care that much about exact representability, so it's surprising that numpy is working to enforce it, and definitely a separate decision from what to do about overflow.) None of those threads seem to really get into the question of what the best behaviour here *is*, though. Possibly the most defensible choice is to treat ufunc(arr, scalar) operations as performing an implicit cast of the scalar to arr's dtype, and using the standard implicit casting rules -- which I think means, raising an error if !can_cast(scalar, arr.dtype, casting=safe) I like this suggestion. It may break some existing code, but I think it'd be for the best. The current behavior can be very confusing. -=- Olivier break some existing code I really should set up an email filter for this phrase and have it send back an email automatically: Are you nuts?! We just resolved an issue where the safe casting rule unexpectedly broke existing code with regards to unplaced operations. The solution was to warn about the change in the upcoming release and to throw errors in a later release. Playing around with fundemental things like this need to be done methodically and carefully. Cheers! Ben Root Stupid autocorrect: unplaced -- inplace ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On Monday, November 12, 2012, Matthew Brett wrote: Hi, On Mon, Nov 12, 2012 at 8:15 PM, Benjamin Root ben.r...@ou.edu wrote: On Monday, November 12, 2012, Olivier Delalleau wrote: 2012/11/12 Nathaniel Smith n...@pobox.com On Mon, Nov 12, 2012 at 8:54 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I wanted to check that everyone knows about and is happy with the scalar casting changes from 1.6.0. Specifically, the rules for (array, scalar) casting have changed such that the resulting dtype depends on the _value_ of the scalar. Mark W has documented these changes here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.promote_types.html Specifically, as of 1.6.0: In [19]: arr = np.array([1.], dtype=np.float32) In [20]: (arr + (2**16-1)).dtype Out[20]: dtype('float32') In [21]: (arr + (2**16)).dtype Out[21]: dtype('float64') In [25]: arr = np.array([1.], dtype=np.int8) In [26]: (arr + 127).dtype Out[26]: dtype('int8') In [27]: (arr + 128).dtype Out[27]: dtype('int16') There's discussion about the changes here: http://mail.scipy.org/pipermail/numpy-discussion/2011-September/058563.html http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060381.html It seems to me that this change is hard to explain, and does what you want only some of the time, making it a false friend. The old behaviour was that in these cases, the scalar was always cast to the type of the array, right? So np.array([1], dtype=np.int8) + 256 returned 1? Is that the behaviour you prefer? I agree that the 1.6 behaviour is surprising and somewhat inconsistent. There are many places where you can get an overflow in numpy, and in all the other cases we just let the overflow happen. And in fact you can still get an overflow with arr + scalar operations, so this doesn't really fix anything. I find the specific handling of unsigned - signed and float32 - float64 upcasting confusing as well. (Sure, 2**16 isn't exactly representable as a float32, but it doesn't *overflow*, it just gives you 2.0**16... if I'm using float32 then I presumably don't care that much about exact representability, so it's surprising that numpy is working to enforce it, and definitely a separate decision from what to do about overflow.) None of those threads seem to really get into the question of what the best behaviour here *is*, though. Possibly the moWell, hold on though, I was asking earlier in the thread what we thought the behavior should be in 2.0 or maybe better put, sometime in the future. If we know what we think the best answer is, and we think the best answer is worth shooting for, then we can try to think of sensible ways of getting there. I guess that's what Nathaniel and Olivier were thinking of but they can correct me if I'm wrong... Cheers, Matthew I am fine with migrating to better solutions (I have yet to decide on this current situation, though), but whatever change is adopted must go through a deprecation process, which was my point. Outright breaking of code as a first step is the wrong choice, and I was merely nipping it in the bud. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] the fast way to loop over ndarray elements?
On Saturday, November 17, 2012, Chao YUE wrote: Dear all, I need to make a linear contrast of the 2D numpy array data from an interval to another, the approach is: I have another two list: base target, then I check for each ndarray element data[i,j], if base[m] = data[i,j] = base[m+1], then it will be linearly converted to be in the interval of (target[m], target[m+1]), using another function called lintrans. #The way I do is to loop each row and column of the 2D array, and finally loop the intervals constituted by base list: for row in range(data.shape[0]): for col in range(data.shape[1]): for i in range(len(base)-1): if data[row,col]=base[i] and data[row,col]=base[i+1]: data[row,col]=lintrans(data[row,col],(base[i],base[i+1]),(target[i],target[i+1])) break #use break to jump out of loop as the data have to be ONLY transferred ONCE. Now the profiling result shows that most of the time has been used in this loop over the array (plot_array_transg), and less time in calling the linear transformation fuction lintrans: ncalls tottime percallcumtimepercall filename:lineno(function) 180470.1100.000 0.1100.000 mathex.py:132(lintrans) 112.495 12.495 19.061 19.061 mathex.py:196(plot_array_transg) so is there anyway I can speed up this loop? Thanks for any suggestions!! best, Chao If the values in base are ascending, you can use searchsorted() to find out where values from data can be placed into base while maintaining order. Don't know if it is faster, but it would certainly be easier to read. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float32 to float64 casting
On Saturday, November 17, 2012, Charles R Harris wrote: On Sat, Nov 17, 2012 at 1:00 PM, Olivier Delalleau sh...@keba.bejavascript:_e({}, 'cvml', 'sh...@keba.be'); wrote: 2012/11/17 Gökhan Sever gokhanse...@gmail.com javascript:_e({}, 'cvml', 'gokhanse...@gmail.com'); On Sat, Nov 17, 2012 at 9:47 AM, Nathaniel Smith n...@pobox.comjavascript:_e({}, 'cvml', 'n...@pobox.com'); wrote: On Fri, Nov 16, 2012 at 9:53 PM, Gökhan Sever gokhanse...@gmail.comjavascript:_e({}, 'cvml', 'gokhanse...@gmail.com'); wrote: Thanks for the explanations. For either case, I was expecting to get float32 as a resulting data type. Since, float32 is large enough to contain the result. I am wondering if changing casting rule this way, requires a lot of modification in the NumPy code. Maybe as an alternative to the current casting mechanism? I like the way that NumPy can convert to float64. As if these data-types are continuation of each other. But just the conversation might happen too early --at least in my opinion, as demonstrated in my example. For instance comparing this example to IDL surprises me: I16 np.float32()*5e38 O16 2.77749998e+42 I17 (np.float32()*5e38).dtype O17 dtype('float64') In this case, what's going on is that 5e38 is a Python float object, and Python float objects have double-precision, i.e., they're equivalent to np.float64's. So you're multiplying a float32 and a float64. I think most people will agree that in this situation it's better to use float64 for the output? -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org javascript:_e({}, 'cvml', 'NumPy-Discussion@scipy.org'); http://mail.scipy.org/mailman/listinfo/numpy-discussion OK, I see your point. Python numeric data objects and NumPy data objects mixed operations require more attention. The following causes float32 overflow --rather than casting to float64 as in the case for Python float multiplication, and behaves like in IDL. I3 (np.float32()*np.float32(5e38)) O3 inf However, these two still surprises me: I5 (np.float32()*1).dtype O5 dtype('float64') I6 (np.float32()*np.int32(1)).dtype O6 dtype('float64') That's because the current way of finding out the result's dtype is based on input dtypes only (not on numeric values), and numpy.can_cast('int32', 'float32') is False, while numpy.can_cast('int32', 'float64') is True (and same for int64). Thus it decides to cast to float64. It might be nice to revisit all the casting rules at some point, but current experience suggests that any changes will lead to cries of pain and outrage ;) Chuck Can we at least put these examples into the tests? Also, I think the bigger issue was that, unlike deprecation of a function, it is much harder to grep for particular operations, especially in a dynamic language like python. What were intended as minor bugfixes ended up becoming much larger. Has the casting table been added to the tests? I think that will bring much more confidence and assurances for future changes going forward. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Allowing 0-d arrays in np.take
On Tue, Dec 4, 2012 at 8:57 AM, Sebastian Berg sebast...@sipsolutions.netwrote: Hey, Maybe someone has an opinion about this (since in fact it is new behavior, so it is undefined). `np.take` used to not allow 0-d/scalar input but did allow any other dimensions for the indices. Thinking about changing this, meaning that: np.take(np.arange(5), 0) works. I was wondering if anyone has feelings about whether this should return a scalar or a 0-d array. Typically numpy prefers scalars for these cases (indexing would return a scalar too) for good reasons, so I guess that is correct. But since I noticed this wondering if maybe it returns a 0-d array, I thought I would ask here. Regards, Sebastian At first, I was thinking that the output type should be based on what the input type is. So, if a scalar index was used, then a scalar value should be returned. But this wouldn't be true if the array had other dimensions. So, perhaps it should always be an array. The only other option is to mimic the behavior of the array indexing, which wouldn't be a bad choice. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8
As a point of reference, python 2.4 is on RH5/CentOS5. While RH6 is the current version, there are still enterprises that are using version 5. Of course, at this point, one really should be working on a migration plan and shouldn't be doing new development on those machines... Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also?
On Thu, Dec 13, 2012 at 12:38 PM, Charles R Harris charlesr.har...@gmail.com wrote: The previous proposal to drop python 2.4 support garnered no opposition. How about dropping support for python 2.5 also? Chuck matplotlib 1.2 supports py2.5. I haven't seen any plan to move off of that for 1.3. Is there a compelling reason for dropping 2.5? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also?
My apologies... we support 2.6 and above. +1 on dropping 2.5 support. Ben On Thu, Dec 13, 2012 at 1:12 PM, Benjamin Root ben.r...@ou.edu wrote: On Thu, Dec 13, 2012 at 12:38 PM, Charles R Harris charlesr.har...@gmail.com wrote: The previous proposal to drop python 2.4 support garnered no opposition. How about dropping support for python 2.5 also? Chuck matplotlib 1.2 supports py2.5. I haven't seen any plan to move off of that for 1.3. Is there a compelling reason for dropping 2.5? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Insights / lessons learned from NumPy design
On Wed, Jan 9, 2013 at 9:58 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac alan.is...@gmail.com wrote: I'm just a Python+NumPy user and not a CS type. May I ask a naive question on this thread? Given the work that has (as I understand it) gone into making NumPy usable as a C library, why is the discussion not going in a direction like the following: What changes to the NumPy code base would be required for it to provide useful ndarray functionality in a C extension to Clojure? Is this simply incompatible with the goal that Clojure compile to JVM byte code? IIUC that work was done on a fork of numpy which has since been abandoned by its authors, so... yeah, numpy itself doesn't have much to offer in this area right now. It could in principle with a bunch of refactoring (ideally not on a fork, since we saw how well that went), but I don't think most happy current numpy users are wishing they could switch to writing Lisp on the JVM or vice-versa, so I don't think it's surprising that no-one's jumped up to do this work. If I could just point out that the attempt to fork numpy for the .NET work was done back in the subversion days, and there was little-to-no effort to incrementally merge back changes to master, and vice-versa. With git as our repository now, such work may be more feasible. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New numpy functions: filled, filled_like
On Mon, Jan 14, 2013 at 7:38 AM, Pierre Haessig pierre.haes...@crans.orgwrote: Hi, Le 14/01/2013 00:39, Nathaniel Smith a écrit : (The nice thing about np.filled() is that it makes np.zeros() and np.ones() feel like clutter, rather than the reverse... not that I'm suggesting ever getting rid of them, but it makes the API conceptually feel smaller, not larger.) Coming from the Matlab syntax, I feel that np.zeros and np.ones are in numpy for Matlab (and maybe others ?) compatibilty and are useful for that. Now that I've been enlightened by Python, I think that those functions (especially np.ones) are indeed clutter. Therefore I favor the introduction of these two new functions. However, I think Eric's remark about masked array API compatibility is important. I don't know what other names are possible ? np.const ? Or maybe np.tile is also useful for that same purpose ? In that case adding a dtype argument to np.tile would be useful. best, Pierre I am also +1 on the idea of having a filled() and filled_like() function (I learned a long time ago to just do a = np.empty() and a.fill() rather than the multiplication trick I learned from Matlab). However, the collision with the masked array API is a non-starter for me. np.const() and np.const_like() probably make the most sense, but I would prefer a verb over a noun. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New numpy functions: filled, filled_like
On Mon, Jan 14, 2013 at 12:27 PM, Eric Firing efir...@hawaii.edu wrote: On 2013/01/14 6:15 AM, Olivier Delalleau wrote: - I agree the name collision with np.ma.filled is a problem. I have no better suggestion though at this point. How about initialized()? A verb! +1 from me! For those wondering, I have a personal rule that because functions *do* something, they really should have verbs for their names. I have to learn to read functions like ones and empty like give me ones or give me an empty array. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New numpy functions: filled, filled_like
On Mon, Jan 14, 2013 at 1:56 PM, David Warde-Farley d.warde.far...@gmail.com wrote: On Mon, Jan 14, 2013 at 1:12 PM, Pierre Haessig pierre.haes...@crans.org wrote: In [8]: tile(nan, (3,3)) # (it's a verb ! ) tile, in my opinion, is useful in some cases (for people who think in terms of repmat()) but not very NumPy-ish. What I'd like is a function that takes - an initial array_like a - a shape s - optionally, a dtype (otherwise inherit from a) and broadcasts a to the shape s. In the case of scalars this is just a fill. In the case of, say, a (5,) vector and a (10, 5) shape, this broadcasts across rows, etc. I don't think it's worth special-casing scalar fills (except perhaps as an implementation detail) when you have rich broadcasting semantics that are already a fundamental part of NumPy, allowing for a much handier primitive. I have similar problems with tile. I learned it for a particular use in numpy, and it would be hard for me to see it for another (contextually) different use. I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Shouldn't all in-place operations simply return self?
On Thu, Jan 17, 2013 at 8:54 AM, Jim Vickroy jim.vick...@noaa.gov wrote: On 1/16/2013 11:41 PM, Nathaniel Smith wrote: On 16 Jan 2013 17:54, josef.p...@gmail.com wrote: a = np.random.random_integers(0, 5, size=5) b = a.sort() b a array([0, 1, 2, 5, 5]) b = np.random.shuffle(a) b b = np.random.permutation(a) b array([0, 5, 5, 2, 1]) How do I remember if shuffle shuffles or permutes ? Do we have a list of functions that are inplace? I rather like the convention used elsewhere in Python of naming in-place operations with present tense imperative verbs, and out-of-place operations with past participles. So you have sort/sorted, reverse/reversed, etc. Here this would suggest we name these two operations as either shuffle() and shuffled(), or permute() and permuted(). I like this (tense) suggestion. It seems easy to remember. --jv And another score for functions as verbs! :-P Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New numpy functions: filled, filled_like
On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing efir...@hawaii.edu wrote: On 2013/01/17 4:13 AM, Pierre Haessig wrote: Hi, Le 14/01/2013 20:05, Benjamin Root a écrit : I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples. I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions np.zeros and np.ones, there are maybe not so many usecases for filled arrays *other than zeros values*. I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step. So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases ! I agree with your summary and conclusion. Eric Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases. Ben Root P.S. - I know they aren't verbs... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New numpy functions: filled, filled_like
On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi dani...@grinta.netwrote: On 17/01/2013 23:27, Mark Wiebe wrote: Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling? np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan) Wouldn't it be more natural to extend the ndarray constructor? np.ndarray((10, 10), fill=np.nan) It looks more natural to me. In this way it is not possible to have the _like extension, but I don't see it as a major drawback. Cheers, Daniele This isn't a bad idea. Although, I would wager that most people, like myself, use np.array() and np.array_like() instead of np.ndarray(). We should also double-check and see how well that would fit in with the other contructors like masked arrays and matrix objects. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New numpy functions: filled, filled_like
On Fri, Jan 18, 2013 at 11:36 AM, Daniele Nicolodi dani...@grinta.netwrote: On 18/01/2013 15:19, Benjamin Root wrote: On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi dani...@grinta.net mailto:dani...@grinta.net wrote: On 17/01/2013 23:27, Mark Wiebe wrote: Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling? np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan) Wouldn't it be more natural to extend the ndarray constructor? np.ndarray((10, 10), fill=np.nan) It looks more natural to me. In this way it is not possible to have the _like extension, but I don't see it as a major drawback. Cheers, Daniele This isn't a bad idea. Although, I would wager that most people, like myself, use np.array() and np.array_like() instead of np.ndarray(). We should also double-check and see how well that would fit in with the other contructors like masked arrays and matrix objects. Hello Ben, I don't really get what you mean with this. np.array() construct a numpy array from an array-like object, np.ndarray() accepts a dimensions tuple as first parameter, I don't see any np.array_like in the current numpy release. Cheers, Daniele My bad, I had a brain-fart and got mixed up. I was thinking of np.empty(). In fact, I never use np.ndarray(), I use np.empty(). Besides np.ndarray() being the actual constructor, what is the difference between them? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.where: x and y need to have the same shape as condition ?
On Tue, Jan 29, 2013 at 6:16 AM, denis denis-bz...@t-online.de wrote: Folks, the doc for `where` says x and y need to have the same shape as condition http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.where.html But surely where is equivalent to: [xv if c else yv for (c,xv,yv) in zip(condition,x,y)] holds as long as len(condition) == len(x) == len(y) ? And `condition` can be broadcast ? n = 3 all01 = np.array([ t for t in np.ndindex( n * (2,) )]) # 000 001 ... x = np.zeros(n) y = np.ones(n) w = np.where( all01, y, x ) # 2^n x n Can anyone please help me understand `where` / extend where is equivalent to ... ? Thanks, cheers -- denis Do keep in mind the difference between len() and shape (they aren't the same for 2 and greater dimension arrays). But, ultimately, yes, the arrays have to have the same shape, or use scalars. I haven't checked broadcast-ability though. Perhaps a note should be added into the documentation to explicitly say whether the arrays can be broadcastable. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issues to fix for 1.7.0rc2.
On Wed, Feb 6, 2013 at 4:18 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 02/06/2013 08:41 AM, Charles R Harris wrote: On Tue, Feb 5, 2013 at 11:50 PM, Jason Grout jason-s...@creativetrax.com mailto:jason-s...@creativetrax.com wrote: On 2/6/13 12:46 AM, Charles R Harris wrote: if we decide to do so I should mention that we don't really depend on either behavior (we probably should have a better doctest testing for an array of None values anyway), but we noticed the oddity and thought we ought to mention it. So it doesn't matter to us which way the decision goes. More Python craziness In [6]: print None or 0 0 In [7]: print 0 or None None To me this seems natural and is just how Python works? I think the rule for or is simply evaluate __nonzero__ of left operand, if it is False, return right operand. The reason is so that you can use it like this: x = get_foo() or get_bar() # if get_foo() returns None # use result of get_bar or def f(x=None): x = x or create_default_x() ... And what if the user passes in a zero or an empty string or an empty list, or if the return value from get_foo() is a perfectly valid zero? This is one of the very few things I have disagreed with PEP8, and Python in general about. I can understand implicit casting of numbers to booleans in order to attract the C/C++ crowd (but I don't have to like it), but what was so hard about x is not None or len(x) == 0? I like my languages explicit. Less magic, more WYSIWYM. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dealing with the mode argument in qr.
On Tue, Feb 5, 2013 at 4:23 PM, Charles R Harris charlesr.har...@gmail.comwrote: Hi All, This post is to bring the discussion of PR #2965https://github.com/numpy/numpy/pull/2965to the attention of the list. There are at least three issues in play here. 1) The PR adds modes 'big' and 'thin' to the current modes 'full', 'r', 'economic' for qr factorization. The problem is that the current 'full' is actually 'thin' and 'big' should be 'full'. The solution here was to raise a FutureWarning on use of 'full', alias it to 'thin' for the time being, and at some distant time change 'full' to alias 'big'. 2) The 'economic' mode serves little purpose. I propose to deprecate it and add a 'qrf' mode instead, corresponding to scipy's 'raw' mode. We can't use 'raw' itself as traditionally the mode may be specified using the first letter only and that leads to a conflict with 'r'. 3) As suggested in 2, the use of single letter abbreviations can constrain the options in choosing mode names and they are not as informative as the full name. A possibility here is to deprecate the use of the abbreviations in favor of the full names. A longer term problem is the divergence between the numpy and scipy versions of qr. The divergence is enough that I don't see any easy way to come to a common interface, but that is something that would be desirable if possible. Thoughts? Chuck I would definitely be in favor of deprecating abbreviations. And while we are on the topic of mode names, scipy.ndimage.filters.percentile_filter() has modes of 'mirror' and 'reflect', and I don't see any documentation stating if they are the same, or what are different about them. I just came across this yesterday. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Where's that function?
On Wed, Feb 6, 2013 at 1:08 PM, josef.p...@gmail.com wrote: I'm convinced that I saw a while ago a function that uses a list of interval boundaries to index into an array, either to iterate or to take. I thought that's very useful, but didn't make a note. Now, I have no idea where I saw this (I thought numpy), and I cannot find it anywhere. any clues? Some possibilities: np.array_split() np.split() np.ndindex() np.nditer() np.nested_iters() np.ravel_multi_index() Your description reminded me of a function I came across once, but I can't remember if one of these was it or if it was another one. IHTH, Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array accumulation in numpy
On Tue, Feb 19, 2013 at 10:00 AM, Tony Ladd tl...@che.ufl.edu wrote: I want to accumulate elements of a vector (x) to an array (f) based on an index list (ind). For example: x=[1,2,3,4,5,6] ind=[1,3,9,3,4,1] f=np.zeros(10) What I want would be produced by the loop for i=range(6): f[ind[i]]=f[ind[i]]+x[i] The answer is f=array([ 0., 7., 0., 6., 5., 0., 0., 0., 0., 3.]) When I try to use implicit arguments f[ind]=f[ind]+x I get f=array([ 0., 6., 0., 4., 5., 0., 0., 0., 0., 3.]) So it takes the last value of x that is pointed to by ind and adds it to f, but its the wrong answer when there are repeats of the same entry in ind (e.g. 3 or 1) I realize my code is incorrect, but is there a way to make numpy accumulate without using loops? I would have thought so but I cannot find anything in the documentation. Would much appreciate any help - probably a really simple question. Thanks Tony I believe you are looking for the equivalent of accumarray in Matlab? Try this: http://www.scipy.org/Cookbook/AccumarrayLike It is a bit touchy about lists and 1-D numpy arrays, but it does the job. Also, I think somebody posted an optimized version for simple sums recently to this list. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adding .abs() method to the array object
On Sat, Feb 23, 2013 at 8:20 PM, josef.p...@gmail.com wrote: On Sat, Feb 23, 2013 at 3:33 PM, Robert Kern robert.k...@gmail.com wrote: On Sat, Feb 23, 2013 at 7:25 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Feb 23, 2013 at 3:38 PM, Till Stensitzki mail.t...@gmx.de wrote: Hello, i know that the array object is already crowded, but i would like to see the abs method added, especially doing work on the console. Considering that many much less used functions are also implemented as a method, i don't think adding one more would be problematic. My gut feeling is that we have too many methods on ndarray, not too few, but in any case, can you elaborate? What's the rationale for why np.abs(a) is so much harder than a.abs(), and why this function and not other unary functions? Or even abs(a). my reason is that I often use arr.max() but then decide I want to us abs and need np.max(np.abs(arr)) instead of arr.abs().max() (and often I write that first to see the error message) I don't like np.abs(arr).max() because I have to concentrate to much on the braces, especially if arr is a calculation I wrote several times def maxabs(arr): return np.max(np.abs(arr)) silly, but I use it often and np.is_close is not useful (doesn't show how close) Just a small annoyance, but I think it's the method that I miss most often. Josef My issue is having to remember which ones are methods and which ones are functions. There doesn't seem to be a rhyme or reason for the choices, and I would rather like to see that a line is drawn, but I am not picky as to where it is drawn. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] a question about freeze on numpy 1.7.0
On Sun, Feb 24, 2013 at 8:16 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Hi Gelin, On Sun, Feb 24, 2013 at 12:08 AM, Gelin Yan dynami...@gmail.com wrote: Hi All When I used numpy 1.7.0 with cx_freeze 4.3.1 on windows, I quickly found out even a simple import numpy may lead to program failed with following exception: AttributeError: 'module' object has no attribute 'sys' After a poking around some codes I noticed /numpy/core/__init__.py has a line 'del sys' at the bottom. After I commented this line, and repacked the whole program, It ran fine. I also noticed this 'del sys' didn't exist on numpy 1.6.2 I am curious why this 'del sys' should be here and whether it is safe to omit it. Thanks. The del sys line was introduced in the commit: https://github.com/numpy/numpy/commit/4c0576fe9947ef2af8351405e0990cebd83ccbb6 and it seems to me that it is needed so that the numpy.core namespace is not cluttered by it. Can you post the full stacktrace of your program (and preferably some instructions how to reproduce the problem)? It should become clear where the problem is. Thanks, Ondrej I have run into issues with doing del sys before, but usually with respect to my pythonrc file. Because the import of sys has already happened, python won't let you import the module again in the same namespace (in my case, the runtime environment). I don't know how the frozen binaries work, but maybe something along those lines is happening? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reshaping arrays
On Sat, Mar 2, 2013 at 11:35 PM, Sudheer Joseph sudheer.jos...@yahoo.comwrote: Hi Brad, I am not getting the attribute reshape for the array, are you having a different version of numpy than mine? I have In [55]: np.__version__ Out[55]: '1.7.0' and detail of the shape details of variable In [57]: ssh?? Type: NetCDFVariable String Form:NetCDFVariable object at 0x492d3d8 Namespace: Interactive Length: 75 Docstring: NetCDF Variable In [58]: ssh.shape Out[58]: (75, 140, 180) ssh?? Type: NetCDFVariable String Form:NetCDFVariable object at 0x492d3d8 Namespace: Interactive Length: 75 Docstring: NetCDF Variable In [66]: ssh.shape Out[66]: (75, 140, 180) In [67]: ssh.reshape(75,140*180) --- AttributeErrorTraceback (most recent call last) /home/sjo/RAMA_20120807/adcp/ipython-input-67-1a21dae1d18d in module() 1 ssh.reshape(75,140*180) AttributeError: reshape Ah, you have a NetCDF variable, which in many ways purposefully looks like a NumPy array, but isn't. Just keep in mind that a NetCDF variable is merely a way to have the data available without actually reading it in until you need it. If you do: ssh_data = ssh[:] Then the NetCDF variable will read all the data in the file and return it as a numpy array that can be manipulated as you wish. I hope that helps! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Implementing a find first style function
On Tue, Mar 5, 2013 at 9:15 AM, Phil Elson pelson@gmail.com wrote: The ticket https://github.com/numpy/numpy/issues/2269 discusses the possibility of implementing a find first style function which can optimise the process of finding the first value(s) which match a predicate in a given 1D array. For example: a = np.sin(np.linspace(0, np.pi, 200)) print find_first(a, lambda a: a 0.9) ((71, ), 0.900479032457) This has been discussed in several locations: https://github.com/numpy/numpy/issues/2269 https://github.com/numpy/numpy/issues/2333 http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item *Rationale* For small arrays there is no real reason to avoid doing: a = np.sin(np.linspace(0, np.pi, 200)) ind = (a 0.9).nonzero()[0][0] print (ind, ), a[ind] (71,) 0.900479032457 But for larger arrays, this can lead to massive amounts of work even if the result is one of the first to be computed. Example: a = np.arange(1e8) print (a == 5).nonzero()[0][0] 5 So a function which terminates when the first matching value is found is desirable. As mentioned in #2269, it is possible to define a consistent ordering which allows this functionality for 1D arrays, but IMHO it overcomplicates the problem and was not a case that I personally needed, so I've limited the scope to 1D arrays only. *Implementation* My initial assumption was that to get any kind of performance I would need to write the *find* function in C, however after prototyping with some array chunking it became apparent that a trivial python function would be quick enough for my needs. The approach I've implemented in the code found in #2269 simply breaks the array into sub-arrays of maximum length *chunk_size* (2048 by default, though there is no real science to this number), applies the given predicating function, and yields the results from *nonzero()*. The given function should be a python function which operates on the whole of the sub-array element-wise (i.e. the function should be vectorized). Returning a generator also has the benefit of allowing users to get the first *n*matching values/indices. *Results* I timed the implementation of *find* found in my comment at https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with an obvious test: In [1]: from np_utils import find In [2]: import numpy as np In [3]: import numpy.random In [4]: np.random.seed(1) In [5]: a = np.random.randn(1e8) In [6]: a.min(), a.max() Out[6]: (-6.1194900990552776, 5.9632246301166321) In [7]: next(find(a, lambda a: np.abs(a) 6)) Out[7]: ((33105441,), -6.1194900990552776) In [8]: (np.abs(a) 6).nonzero() Out[8]: (array([33105441]),) In [9]: %timeit (np.abs(a) 6).nonzero() 1 loops, best of 3: 1.51 s per loop In [10]: %timeit next(find(a, lambda a: np.abs(a) 6)) 1 loops, best of 3: 912 ms per loop In [11]: %timeit next(find(a, lambda a: np.abs(a) 6, chunk_size=10)) 1 loops, best of 3: 470 ms per loop In [12]: %timeit next(find(a, lambda a: np.abs(a) 6, chunk_size=100)) 1 loops, best of 3: 483 ms per loop This shows that picking a sensible *chunk_size* can yield massive speed-ups (nonzero is x3 slower in one case). A similar example with a much smaller 1D array shows similar promise: In [41]: a = np.random.randn(1e4) In [42]: %timeit next(find(a, lambda a: np.abs(a) 3)) 1 loops, best of 3: 35.8 us per loop In [43]: %timeit (np.abs(a) 3).nonzero() 1 loops, best of 3: 148 us per loop As I commented on the issue tracker, if you think this function is worth taking forward, I'd be happy to open up a pull request. Feedback greatfully received. Cheers, Phil In the interest of generalizing code and such, could such approaches be used for functions like np.any() and np.all() for short-circuiting if True or False (respectively) are found? I wonder what other sort of functions in NumPy might benefit from this? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] feature tracking in numpy/scipy
On Sat, Mar 2, 2013 at 5:32 PM, Scott Collis scollis.a...@gmail.com wrote: Good afternoon list, I am looking at feature tracking in a 2D numpy array, along the lines of Dixon and Wiener 1993 (for tracking precipitating storms) Identifying features based on threshold is quite trivial using ndimage.label b_fld=np.zeros(mygrid.fields['rain_rate_A']['data'].shape) rr=10 b_fld[mygrid.fields['rain_rate_A']['data'] rr]=1.0 labels, numobjects = ndimage.label(b_fld[0,0,:,:]) (note mygrid.fields['rain_rate_A']['data'] is dimensions time,height, y, x) using the matplotlib contouring and fetching the vertices I can get a nice list of polygons of rain rate above a certain threshold… Now from here I can just go and implement the Dixon and Wiener methodology but I thought I would check here first to see if anyone know of a object/feature tracking algorithm in numpy/scipy or using numpy arrays (it just seems like something people would want to do!).. i.e. something that looks back and forward in time and identifies polygon movement and identifies objects with temporal persistence.. Cheers! Scott Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification, Tracking, Analysis, and Nowcasting—A Radar-based Methodology. *Journal of Atmospheric and Oceanic Technology*, *10*, 785–797, doi:10.1175/1520-0426(1993)0100785:TTITAA2.0.CO;2. http://journals.ametsoc.org/doi/abs/10.1175/1520-0426%281993%29010%3C0785%3ATTITAA%3E2.0.CO%3B2 Say hello to my PhD project: https://github.com/WeatherGod/ZigZag In it, I have the centroid-tracking portion of the TITAN code, along with SCIT, and hooks into MHT. Several of the dependencies are also available in my repositories. Cheers! Ben P.S. - I have personally met Dr. Dixon on multiple occasions and he is a great guy to work with. Feel free to email him or myself with questions about TITAN. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] timezones and datetime64
On Wed, Apr 3, 2013 at 7:52 PM, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: Personally, I never need finer resolution than seconds, nor more than a century, so it's no big deal to me, but just wondering A use case for finer resolution than seconds (in our field, no less!) is lightning data. At the last SciPy conference, a fellow meteorologist mentioned how difficult it was to plot out lightning data at resolutions finer than microseconds (which is the resolution of the python datetime objects). Matplotlib has not supported the datetime64 object yet (John passed before he could write up that patch). Cheers! Ben By the way, my 12th Rule of Programming is Never roll your own datetime ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] datetime64 1970 issue
On Tue, Apr 16, 2013 at 7:45 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: On Tue, Apr 16, 2013 at 4:55 PM, Bob Nnamtrop bob.nnamt...@gmail.com wrote: I am curious if others have noticed an issue with datetime64 at the beginning of 1970. First: In [144]: (np.datetime64('1970-01-01') - np.datetime64('1969-12-31')) Out[144]: numpy.timedelta64(1,'D') OK this look fine, they are one day apart. But look at this: In [145]: (np.datetime64('1970-01-01 00') - np.datetime64('1969-12-31 00')) Out[145]: numpy.timedelta64(31,'h') Hmmm, seems like there are 7 extra hours? Am I missing something? I don't see this at any other year. This discontinuity makes it hard to use the datetime64 object without special adjustment in ones code. I assume this a bug? Indeed, this looks like a bug, I can reproduce it on linux as well: In [1]: import numpy as np In [2]: np.datetime64('1970-01-01') - np.datetime64('1969-12-31') Out[2]: numpy.timedelta64(1,'D') In [3]: np.datetime64('1970-01-01 00') - np.datetime64('1969-12-31 00') Out[3]: numpy.timedelta64(31,'h') Maybe, maybe not... were you alive then? For all we know, Charles and co. were partying an extra 7 hours every day back then? Just sayin' Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] datetime64 1970 issue
On Wed, Apr 17, 2013 at 7:10 PM, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: On Wed, Apr 17, 2013 at 1:09 PM, Bob Nnamtrop bob.nnamt...@gmail.com wrote: It would seem that before 1970 the dates do not include the time zone adjustment while after 1970 they do. This is the source of the extra 7 hours. In [21]: np.datetime64('1970-01-01 00') Out[21]: numpy.datetime64('1970-01-01T00:00-0700','h') In [22]: np.datetime64('1969-12-31 00') Out[22]: numpy.datetime64('1969-12-31T00:00Z','h') In [111]: np.datetime64('1970-01-01 00').view(np.int64) Out[111]: 8 indicates that it is doing the input transition differently, as the underlying value is wrong for one. (another weird note -- I;m in pacific time, which is -7 now, with DSTso why the 8?) Aren't we on standard time at Jan 1st? So, at that date, you would have been -8. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] datetime64 1970 issue
On Thu, Apr 18, 2013 at 2:27 AM, Joris Van den Bossche jorisvandenboss...@gmail.com wrote: ANyone tested this on Windows? On Windows 7, numpy 1.7.0 (Anaconda 1.4.0 64 bit), I don't even get a wrong answer, but an error: In [3]: np.datetime64('1969-12-31 00') Out[3]: numpy.datetime64('1969-12-31T00:00Z','h') In [4]: np.datetime64('1970-01-01 00') --- OSError Traceback (most recent call last) ipython-input-4-ebf323268a4e in module() 1 np.datetime64('1970-01-01 00') OSError: Failed to use 'mktime' to convert local time to UTC Have you tried np.test()? I know some of the tests I added awhile back utilized pre-epoch dates to test sorting and min/max finding. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] type conversion question
On Thu, Apr 18, 2013 at 7:31 PM, K.-Michael Aye kmichael@gmail.comwrote: I don't understand why sometimes a direct assignment of a new dtype is possible (but messes up the values), and why at other times a seemingly harmless upcast (in my potentially ignorant point of view) is not possible. So, maybe a direct assignment of a new dtype is actually never a good idea? (I'm asking), and one should always go the route of newarray= array(oldarray, dtype=newdtype), but why then sometimes the upcast provides an error and forbids it and sometimes not? Examples: In [140]: slope.read_center_window() In [141]: slope.data.dtype Out[141]: dtype('float32') In [142]: slope.data[1,1] Out[142]: 10.044398 In [143]: val = slope.data[1,1] In [144]: slope.data.dtype='float64' In [145]: slope.data[1,1] Out[145]: 586.98938070189865 #- #Here, the value of data[1,1] has completely changed (and so has the rest of the array), and no error was given. # But then... # In [146]: val.dtype Out[146]: dtype('float32') In [147]: val Out[147]: 10.044398 In [148]: val.dtype='float64' --- AttributeErrorTraceback (most recent call last) ipython-input-148-52a373a41cac in module() 1 val.dtype='float64' AttributeError: attribute 'dtype' of 'numpy.generic' objects is not writable === end of code So why is there an error in the 2nd case, but no error in the first case? Is there a logic to it? When you change a dtype like that in the first one, you aren't really upcasting anything. You are changing how numpy interprets the underlying bits. Because you went from a 32-bit element size to a 64-bit element size, you are actually seeing the double-precision representation of 2 of your original data points together. The correct way to cast is to do something like a = slope.data.astype('float64'). That makes a copy and does the casting as safely as possible. As for the second one, you have what is called a numpy scalar. These aren't quite the same thing as a numpy array, and can be a bit more restrictive. Can you imagine what sort of issues that would pose if one could start viewing and modifying neighboring chunks of memory without ever having to mess around with pointers? It would be a hacker's dream! I hope that clears things up. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.8 release
On Thu, Apr 25, 2013 at 11:16 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, I think it is time to start the runup to the 1.8 release. I don't know of any outstanding blockers but if anyone has a PR/issue that they feel needs to be in the next Numpy release now is the time to make it known. Chuck Has a np.minmax() function been added yet? I know it keeps getting +1's whenever suggested, but I haven't seen it done yet. Another annoyance is the lack of a np.nanmean() and np.nanstd() function. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8
Currently, I am in the process of migrating some co-workers from Matlab and IDL, and the number one complaint I get is that numpy has nansum() but no nanmean() and nanstd(). While we do have an alternative in the form of masked arrays, most of these people are busy enough trying to port their existing code over to python that this sort of stumbling block becomes difficult to explain away. Given how relatively simple these functions are, I can not think of any reason not to include these functions in v1.8. Of course, the documentation for these functions should certainly include mention of masked arrays. There is one other non-trivial function that have been discussed before: np.minmax(). My thinking is that it would return a 2xN array (where N is whatever size of the result that would be returned if just np.min() was used). This would allow one to do min, max = np.minmax(X). Are there any other functions that others feel are missing from numpy and would like to see for v1.8? Let's discuss them here. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8
On Wed, May 1, 2013 at 1:13 AM, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: Of course, the documentation for discussed before: np.minmax(). My thinking is that it would return a 2xN array How about a tuple: (min, max)? I am not familiar enough with numpy internals to know which is the better approach to implement. I kind of feel that the 2xN array approach would be more flexible in case a user wants all of this information in a single array, while still allowing for unpacking as if it was a tuple. I would rather enable unforeseen use-cases rather than needlessly restricting them. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal of new function: iteraxis()
On Mon, Apr 29, 2013 at 2:10 PM, Andrew Giessel andrew_gies...@hms.harvard.edu wrote: Matthew: Thanks for the link to array order discussion. Any more thoughts on Phil's slice() function? I rather like Phil's solution. Just some caveats. Will it always return views or copies? It should be one or the other (I haven't looked closely enough to check), and it should be documented to that affect. Plus, tests should be added to make sure it does that. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8
So, to summarize the thread so far: Consensus: np.nanmean() np.nanstd() np.minmax() np.argminmax() Vague Consensus: np.sincos() No Consensus (possibly out of scope for this topic): Better constructors for complex types I can probably whip up the PR for the nanmean() and nanstd(), and can certainly help out with the minmax funcs. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8
I have created a PR for the first two (and got np.nanvar() for free). https://github.com/numpy/numpy/pull/3297 Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.nanmin, numpy.nanmax, and scipy.stats.nanmean
On Thu, May 16, 2013 at 6:09 PM, Phillip Feldman phillip.m.feld...@gmail.com wrote: It seems odd that `nanmin` and `nanmax` are in NumPy, while `nanmean` is in SciPy.stats. I'd like to propose that a `nanmean` function be added to NumPy. Have no fear. There is already plans for its inclusion in the next release: https://github.com/numpy/numpy/pull/3297/files Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy sprints at Scipy 2013, Austin: call for topics and hands to help
On Sat, May 25, 2013 at 12:37 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, May 25, 2013 at 9:51 AM, David Cournapeau courn...@gmail.comwrote: On Sat, May 25, 2013 at 4:19 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, May 25, 2013 at 8:23 AM, David Cournapeau courn...@gmail.com wrote: Hi there, I agreed to help organising NumPy sprints during the scipy 2013 conference in Austin. As some of you may know, Stéfan and me will present a tutorial on NumPy C code, so if we do our job correctly, we should have a few new people ready to help out during the sprints. It would be good to: - have some focus topics for improvements - know who is going to be there at the sprint to work on things and/or help newcomers Things I'd like to work on myself is looking into splitting things from multiarray, think about a better internal API for dtype registration/hooks (with the goal to remove any date dtype hardcoding in both multiarray and ufunc machinery), but I am sure others have more interesting ideas :) I'd like to get a 1.8 beta out or at least get to the point where we can make that leap. Sure, I am fine doing this in a branch post 1.8.x, I am not in a hurry. There is a lot of new stuff that needs to be tested, PR's to go through, and I have a suspicion that a memory allocation error might have crept in somewhere. Will you be there at the conference ? Yes. I'm not very good at sprinting though. I prefer to amble with a big screen, nice keyboard, and a cup of coffee ;) Chuck Oh, I am sure we could get you set up with a projector screen and a nice bluetooth keyboard... Now, I think you just hit on something with the coffee. I don't recall previous sprints having coffee available. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt() skips comments
On Fri, May 31, 2013 at 5:08 PM, Albert Kottke albert.kot...@gmail.comwrote: I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. This will allow reading files with and without comments and/or names. The difference is here: https://github.com/arkottke/numpy/compare/my-genfromtxt Careful with semantics here. First off, using the last comment line as the source for names might initially make sense, except when there are comments within the data file. I would suggest going for last comment line before the first line of data. Second, sometimes the names come from an un-commented first line, but comments are still used within the file elsewhere. Just some food for thought. I don't know if the current design is best or not. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
Could non-monotonicity be detected as part of the interp process? Perhaps a sign switch in the deltas? I have been bitten by this problem too. Cheers! Ben Root On Jun 4, 2013 9:08 PM, Eric Firing efir...@hawaii.edu wrote: On 2013/06/04 2:05 PM, Charles R Harris wrote: On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote: Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic cases). Seems reasonable, although it might add a bit of execution time. The monotonicity test should be an option if it is available at all; when interpolating a small number of points from a large pair of arrays, the single sweep through the whole array could dominate the execution time. Checking for increasing versus decreasing, in contrast, can be done fast, so handling the decreasing case transparently is reasonable. Eric Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] floats coerced to string with {:f}.format() ?
You can treat a record in a record array like a tuple or a dictionary when it comes to formatting. So, either refer to the index element you want formatted as a float, or refer to it by name (in the formatting language). By just doing {:f}, you are just grabbing the first one, which is XXYYZZ and trying to format that. But remember, you can only do this to a single record at a time, not the entire record array at once. Regular numpy arrays can not be formatted in this manner, hence your other attempt failures. Cheers! Ben Root On Thu, Jun 6, 2013 at 3:48 PM, Maccarthy, Jonathan K jkm...@lanl.govwrote: Hi everyone, I've looked in the mailing list archives and with the googles, but haven't yet found any hints with this question... I have a float field in a NumPy record that looks like it's being substituted as a string in the Python {:f}.format() mini-language, thus throwing an error: In [1]: tmp = np.rec.array([('XYZZ', 2001123, -23.82396)], dtype=np.dtype([('sta', '|S6'), ('ondate', 'i8'), ('lat', 'f4')]))[0] In [2]: type(tmp) Out[3]: numpy.core.records.record In [3]: tmp Out[3]: ('XYZZ', 2001123, -23.823917388916016) In [4]: tmp.sta, tmp.ondate, tmp.lat Out[4]: ('XYZZ', 2001123, -23.823917) # strings and integers work In [5]: '{0.sta:6.6s} {0.ondate:8d}'.format(tmp) Out[5]: 'XYZZ2001123' # lat is a float, but it seems to be coerced to a string first, and failing In [6]: '{0.sta:6.6s} {0.ondate:8d} {0.lat:11.6f}'.format(tmp) --- ValueErrorTraceback (most recent call last) /Users/jkmacc/ipython-input-312-bff8066cfde8 in module() 1 '{0.sta:6.6s} {0.ondate:8d} {0.lat:11.6f}'.format(tmp) ValueError: Unknown format code 'f' for object of type 'str' # string formatting for doesn't fail In [7]: '{0.sta:6.6s} {0.ondate:8d} {0.lat:11.6s}'.format(tmp) Out[7]: 'XYZZ2001123 -23.82' This also fails: In [7]: {:f}.format(np.array(3.2, dtype='f4')) --- ValueErrorTraceback (most recent call last) /Users/jkmacc/ipython-input-314-33119128e3e6 in module() 1 {:f}.format(np.array(3.2, dtype='f4')) ValueError: Unknown format code 'f' for object of type 'str' Does anyone understand what's happening? Thanks for your help. Best, Jon ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.filled, again
On Thu, Jun 13, 2013 at 9:36 AM, Aldcroft, Thomas aldcr...@head.cfa.harvard.edu wrote: On Wed, Jun 12, 2013 at 2:55 PM, Eric Firing efir...@hawaii.edu wrote: On 2013/06/12 8:13 AM, Warren Weckesser wrote: That's why I suggested 'filledwith' (add the underscore if you like). This also allows a corresponding masked implementation, 'ma.filledwith', without clobbering the existing 'ma.filled'. Consensus on np.filled? absolutely not, you do not have a consensus. np.filledwith or filled_with: fine with me, maybe even with everyone--let's see. I would prefer the underscore version. +1 on np.filled_with. It's unique the meaning is extremely obvious. We do use np.ma.filled in astropy so a big -1 on deprecating that (which would then require doing numpy version checks to get the right method). Even when there is an NA dtype the numpy.ma users won't go away anytime soon. I like np.filled_with(), but just to be devil's advocate, think of the syntax: np.filled_with((10, 24), np.nan) As I read that, I am filling the array with (10, 24), not NaNs. Minor issue, for sure, but just thought I raise that. -1 on deprecation of np.ma.filled(). -1 on np.filled() due to collision with np.ma (both conceptually and programatically). np.values() might be a decent alternative. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.filled, again
On Fri, Jun 14, 2013 at 1:21 PM, Robert Kern robert.k...@gmail.com wrote: On Fri, Jun 14, 2013 at 6:18 PM, Eric Firing efir...@hawaii.edu wrote: On 2013/06/14 5:15 AM, Alan G Isaac wrote: On 6/14/2013 9:27 AM, Aldcroft, Thomas wrote: If I just saw np.values(..) in some code I would never guess what it is doing from the name That suggests np.fromvalues. But more important than the name I think is allowing broadcasting of the values, based on NumPy's broadcasting rules. Broadcasting a scalar is then a special case, even if it is the case that has dominated this thread. True, but this looks to me like mission creep. All of this fuss is about replacing two lines of user code with a single line. If it can't be kept simple, both in implementation and in documentation, it shouldn't be done at all. I'm not necessarily opposed to your suggestion, but I'm skeptical. It's another two-liner: [~] |1 x = np.empty([3,4,5]) [~] |2 x[...] = np.arange(5) [~] |3 x array([[[ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.]], [[ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.]], [[ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.], [ 0., 1., 2., 3., 4.]]]) It's wafer-thin! True, but wouldn't we rather want to encourage the use of broadcasting in the numerical operations rather than creating new arrays from broadcasted arrays? a = np.arange(5) + np.ones((3, 4, 5)) b = np.filled((3, 4, 5), np.arange(5)) + np.ones((3, 4, 5)) The first one is much easier to read, and is more efficient than the second (theoretical) one because it needs to create two (3, 4, 5) arrays rather than just one. That being said, one could make a similar argument against ones(), zeros(), etc. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.filled, again
On Fri, Jun 14, 2013 at 1:22 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 12, 2013 at 7:43 PM, Eric Firing efir...@hawaii.edu wrote: On 2013/06/12 2:10 AM, Nathaniel Smith wrote: Personally I think that overloading np.empty is horribly ugly, will continue confusing newbies and everyone else indefinitely, and I'm 100% convinced that we'll regret implementing such a warty interface for something that should be so idiomatic. (Unfortunately I got busy and didn't actually say this in the previous thread though.) So I think we should just merge the PR as is. The only downside is the np.ma inconsistency, but, np.ma is already inconsistent (cf. masked_array.fill versus masked_array.filled!), somewhat deprecated, somewhat deprecated? Really? Since when? By whom? Replaced by what? Sorry, not trying to start a fight, just trying to summarize the situation. As far as I can tell: Oh... (puts away iron knuckles) Despite heroic efforts on the part of its authors, numpy.ma has a number of weird quirks (masked data can still trigger invalid value errors), misfeatures (hard versus soft masks), and just plain old pain points (ongoing issues with whether any given operation will respect or preserve the mask). Actually, now that we have a context manager for warning capture, we could actually fix that. It's been in deep maintenance mode for some time; we merge the occasional bug fix that people send in, and that's it. (To be fair, numpy as a whole is fairly slow-moving, but numpy.ma still gets much less attention.) Even if there were active maintainers, no-one really has any idea how to fix any of the problems above; they're not so much bugs as intrinsic limitations of the design. Therefore, my impression is that a majority (not all, but a majority) of numpy developers strongly recommend against the use of numpy.ma in new projects. Such a recommendation should be in writing in the documentation and elsewhere. Furthermore, a proper replacement would also be needed. Just simiply deprecating it without some sort of decent alternative leaves everybody in a lurch. I have high hopes for NA to be that replacement, and the sooner, the better. I could be wrong! And I know there's nothing to really replace it. I'd like to fix that. But I think semi-deprecated is not an unfair shorthand for the above. You will have to pry np.ma from my cold, dead hands! (or distract me with a sufficiently shiny alternative) (I'll even admit that I'd *like* to actually deprecate it. But what I mean by that is, I don't think it's possible to fix it to the point where it's actually a solid/clean/robust library, so I'd like to reach a point where everyone who's currently using it is happier switching to something else and is happy to sign off on deprecating it.) As far as many people are concerned, it is a solid, clean, robust library. and AFAICT there are far more people who will benefit from a clean np.filled idiom than who actually use np.ma (and in particular its fill-value functionality). So there would be two I think there are more np.ma users than you realize. Everyone who uses matplotlib is using np.ma at least implicitly, if not explicitly. Many of the matplotlib examples put np.ma to good use. np.ma.filled is an essential long-standing part of the np.ma API. I don't see any good rationale for generating a conflict with it, when an adequate non-conflicting alternative ('np.initialized', maybe others) exists. I'm aware of that. If I didn't care about the opinions of numpy.ma users, I wouldn't go starting long and annoying mailing list threads about features that are only problematic because of their affect on numpy.ma :-). But, IMHO given the issues with numpy.ma, our number #1 priority ought to be making numpy proper as clean and beautiful as possible; my position that started this thread is basically just that we shouldn't make numpy proper worse just for numpy.ma's sake. That's the tail wagging the dog. And this 'conflict' seems a bit overstated given that (1) np.ma.filled already has multiple names (and 3/4 of the uses in matplotlib use the method version, not the function version), (2) even if we give it a non-conflicting name, np.ma's lack of maintenance means that it'd probably be years before someone got around to actually adding a parallel function to np.ma. [Unless this thread spurs someone into submitting one just to prove me wrong ;-).] Actually, IIRC, np.ma does some sort of auto-wrapping of numpy functions. This is why adding np.filled() would cause a namespace clobbering, I think. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] time to revisit NA/ma ideas
On Fri, Jun 14, 2013 at 6:38 PM, Eric Firing efir...@hawaii.edu wrote: A nice summary of the discussions from a year ago is here: http://www.numpy.org/NA-overview.html It provides food for thought. Eric Perhaps a BoF session should be put together for SciPy 2013, and possibly even have a google hangout session for it to bring in interested parties to the discussion? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Allow == and != to raise errors
I can see where you are getting at, but I would have to disagree. First of all, when a comparison between two mis-shaped arrays occur, you get back a bone fide python boolean, not a numpy array of bools. So if any action was taken on the result of such a comparison assumed that the result was some sort of an array, it would fail (yes, this does make it a bit difficult to trace back the source of the problem, but not impossible). Second, no semantics are broken with this. Are the arrays equal or not? If they weren't broadcastible, then returning False for == and True for != makes perfect sense to me. At least, that is my take on it. Cheers! Ben Root On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg sebast...@sipsolutions.netwrote: Hey, the array comparisons == and != never raise errors but instead simply return False for invalid comparisons. The main example are arrays of non-matching dimensions, and object arrays with invalid element-wise comparisons: In [1]: np.array([1,2,3]) == np.array([1,2]) Out[1]: False In [2]: np.array([1, np.array([2, 3])], dtype=object) == [1, 2] Out[2]: False This seems wrong to me, and I am sure not just me. I doubt any large projects makes use of such comparisons and assume that most would prefer the shape mismatch to raise an error, so I would like to change it. But I am a bit unsure especially about smaller projects. So to keep the transition a bit safer could imagine implementing a FutureWarning for these cases (and that would at least notify new users that what they are doing doesn't seem like the right thing). So the question is: Is such a change safe enough, or is there some good reason for the current behavior that I am missing? Regards, Sebastian (There may be other issues with structured types that would continue returning False I think, because neither side knows how to compare) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the result in some statistics corner cases?
This is going to need to be heavily documented with doctests. Also, just to clarify, are we talking about a ValueError for doing a nansum on an empty array as well, or will that now return a zero? Ben Root On Mon, Jul 15, 2013 at 9:52 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On 7/14/13, Charles R Harris charlesr.har...@gmail.com wrote: Some corner cases in the mean, var, std. *Empty arrays* I think these cases should either raise an error or just return nan. Warnings seem ineffective to me as they are only issued once by default. In [3]: ones(0).mean() /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[3]: nan In [4]: ones(0).var() /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: RuntimeWarning: invalid value encountered in true_divide out=arrmean, casting='unsafe', subok=False) /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[4]: nan In [5]: ones(0).std() /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: RuntimeWarning: invalid value encountered in true_divide out=arrmean, casting='unsafe', subok=False) /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[5]: nan *ddof = number of elements* I think these should just raise errors. The results for ddof = #elements is happenstance, and certainly negative numbers should never be returned. In [6]: ones(2).var(ddof=2) /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[6]: nan In [7]: ones(2).var(ddof=3) Out[7]: -0.0 * nansum* Currently returns nan for empty arrays. I suspect it should return nan for slices that are all nan, but 0 for empty slices. That would make it consistent with sum in the empty case. For nansum, I would expect 0 even in the case of all nans. The point of these functions is to simply ignore nans, correct? So I would aim for this behaviour: nanfunc(x) behaves the same as func(x[~isnan(x)]) Agreed, although that changes current behavior. What about the other cases? Looks like there isn't much interest in the topic, so I'll just go ahead with the following choices: Non-NaN case 1) Empty array - ValueError The current behavior with stats is an accident, i.e., the nan arises from 0/0. I like to think that in this case the result is any number, rather than not a number, so *the* value is simply not defined. So in this case raise a ValueError for empty array. 2) ddof = n - ValueError If the number of elements, n, is not zero and ddof = n, raise a ValueError for the ddof value. Nan case 1) Empty array - Value Error 2) Empty slice - NaN 3) For slice ddof = n - Nan Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the result in some statistics corner cases?
On Jul 15, 2013 11:47 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jul 15, 2013 at 8:58 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris charlesr.har...@gmail.com wrote: snip For nansum, I would expect 0 even in the case of all nans. The point of these functions is to simply ignore nans, correct? So I would aim for this behaviour: nanfunc(x) behaves the same as func(x[~isnan(x)]) Agreed, although that changes current behavior. What about the other cases? Looks like there isn't much interest in the topic, so I'll just go ahead with the following choices: Non-NaN case 1) Empty array - ValueError The current behavior with stats is an accident, i.e., the nan arises from 0/0. I like to think that in this case the result is any number, rather than not a number, so *the* value is simply not defined. So in this case raise a ValueError for empty array. To be honest, I don't mind the current behaviour much sum([]) = 0, len([]) = 0, so it is in a way well defined. At least I am not sure if I would prefer always an error. I am a bit worried that just changing it might break code out there, such as plotting code where it makes perfectly sense to plot a NaN (i.e. nothing), but if that is the case it would probably be visible fast. 2) ddof = n - ValueError If the number of elements, n, is not zero and ddof = n, raise a ValueError for the ddof value. Makes sense to me, especially for ddof n. Just returning nan in all cases for backward compatibility would be fine with me too. Currently if ddof n it returns a negative number for variance, the NaN only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer is zero division). Nan case 1) Empty array - Value Error 2) Empty slice - NaN 3) For slice ddof = n - Nan Personally I would somewhat prefer if 1) and 2) would at least default to the same thing. But I don't use the nanfuncs anyway. I was wondering about adding the option for the user to pick what the fill is (and i.e. if it is None (maybe default) - ValueError). We could also allow this for normal reductions without an identity, but I am not sure if it is useful there. In the NaN case some slices may be empty, others not. My reasoning is that that is going to be data dependent, not operator error, but if the array is empty the writer of the code should deal with that. In the case of the nanvar, nanstd, it might make more sense to handle ddof as 1) if ddof is = axis size, raise ValueError 2) if ddof is = number of values after removing NaNs, return NaN The first would be consistent with the non-nan case, the second accounts for the variable nature of data containing NaNs. Chuck I think this is a good idea in that it naturally follows well with the conventions of what to do with empty arrays / empty slices with nanmean, etc. Note, however, I am not a very big fan of the idea of having two different behaviors for what I see as semantically the same thing. But, my objections are not strong enough to veto it, and I do think this proposal is well thought-out. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What should be the result in some statistics corner cases?
To add a bit of context to the question of nansum on empty results, we currently differ from MATLAB and R in this respect, they return zero no matter what. Personally, I think it should return zero, but our current behavior of returning nans has existed for a long time. Personally, I think we need a deprecation warning and possibly wait to change this until 2.0, with plenty of warning that this will change. Ben Root On Jul 15, 2013 8:46 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jul 15, 2013 at 6:22 PM, Stéfan van der Walt ste...@sun.ac.zawrote: On Mon, 15 Jul 2013 08:33:47 -0600, Charles R Harris wrote: On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root ben.r...@ou.edu wrote: This is going to need to be heavily documented with doctests. Also, just to clarify, are we talking about a ValueError for doing a nansum on an empty array as well, or will that now return a zero? I was going to leave nansum as is, as it seems that the result was by choice rather than by accident. That makes sense--I like Sebastian's explanation whereby operations that define an identity yields that upon empty input. So nansum should return zeros rather than the current NaNs? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] azip
Forgive my ignorance, but has numpy and scipy stopped doing that weird doc editing thing that existed back in the days of Trac? I have actually held back on submitting doc edits because I hated using that thing so much. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] azip
Well, that's nice to know now. However, I distinctly remember being told that any changes made to the docstrings directly in the source would end up getting replaced by whatever was in the doc edit system whenever a merge from it happens. Therefore, if one wanted their edits to be persistent, they had to submit it through the doc edit system. Note, much of my animosity towards the doc edit system was due to issues with the scipy.org being so sluggish back then, and the length of time it took for any edits to finally make it down to the docstrings. Now that scipy.org is much more responsive, and that numpy and scipy has moved on to git, perhaps those two issues are gone now? Sorry for hijacking the thread, this is just the first I am hearing that one can submit documentation edits via PRs and was surprised. Cheers! Ben Root On Thu, Jul 18, 2013 at 1:51 PM, Pauli Virtanen p...@iki.fi wrote: 18.07.2013 20:18, Benjamin Root kirjoitti: Forgive my ignorance, but has numpy and scipy stopped doing that weird doc editing thing that existed back in the days of Trac? I have actually held back on submitting doc edits because I hated using that thing so much. You were never required to use it. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv 30% slowdown
On Mon, Jul 22, 2013 at 10:55 AM, Yaroslav Halchenko li...@onerussian.comwrote: At some point I hope to tune up the report with an option of viewing the plot using e.g. nvd3 JS so it could be easier to pin point/analyze interactively. shameless plug... the soon-to-be-finalized matplotlib-1.3 has a WebAgg backend that allows for interactivity. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] .flat (was: add .H attribute?)
On Tue, Jul 23, 2013 at 10:11 AM, Stéfan van der Walt ste...@sun.ac.zawrote: On Tue, Jul 23, 2013 at 3:39 PM, Alan G Isaac alan.is...@gmail.com wrote: On 7/23/2013 9:09 AM, Pauli Virtanen wrote: .flat which I think is rarely used Don't assume .flat is not commonly used. A common idiom in matlab is a[:] to flatten an array. When porting code over from matlab, it is typical to replace that with either a.flat or a.flatten(), depending on whether an iterator or an array is needed. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] .flat
On Tue, Jul 23, 2013 at 10:46 AM, Pauli Virtanen p...@iki.fi wrote: 23.07.2013 17:34, Benjamin Root kirjoitti: [clip] Don't assume .flat is not commonly used. A common idiom in matlab is a[:] to flatten an array. When porting code over from matlab, it is typical to replace that with either a.flat or a.flatten(), depending on whether an iterator or an array is needed. It is much more rarely used than `ravel()` and `flatten()`, as can be verified by grepping e.g. the matplotlib source code. The matplotlib source code is not a port from Matlab, so grepping that wouldn't prove anything. Meanwhile, the NumPy for Matlab users page notes that a.flatten() makes a copy. A newbie to NumPy would then (correctly) look up the documentation for a.flatten() and see in the See Also section that a.flat is just an iterator rather than a copy, and would often use that to avoid the copy. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv 30% slowdown
On Mon, Jul 22, 2013 at 1:28 PM, Yaroslav Halchenko li...@onerussian.comwrote: On Mon, 22 Jul 2013, Benjamin Root wrote: At some point I hope to tune up the report with an option of viewing the plot using e.g. nvd3 JS so it could be easier to pin point/analyze interactively. shameless plug... the soon-to-be-finalized matplotlib-1.3 has a WebAgg backend that allows for interactivity. that's just sick! do you know about any motion in python-sphinx world on supporting it? is there any demo page you would recommend to assess what to expect supported in upcoming webagg? Oldie but goodie: http://mdboom.github.io/blog/2012/10/11/matplotlib-in-the-browser-its-coming/ Official Announcement: http://matplotlib.org/1.3.0/users/whats_new.html#webagg-backend Note, this is different than what is now available in IPython Notebook (it isn't really interactive there). As for what is supported, just about everything you can do normally, can be done in WebAgg. I have no clue about sphinx-level support. Now, back to your regularly scheduled program. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] add .H attribute?
On Wed, Jul 24, 2013 at 8:47 AM, Daπid davidmen...@gmail.com wrote: An idea: If .H is ideally going to be a view, and we want to keep it this way, we could have a .h() method with the present implementation. This would preserve the name .H for the conjugate view --when someone finds the way to do it. This way we would increase the readability, simplify some matrix algebra code, and keep the API consistency. I could get behind a .h() method until .H attribute is ready. +1 Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import overhead of numpy.testing
On Aug 10, 2013 12:50 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Sat, Aug 10, 2013 at 5:21 PM, Andrew Dalke da...@dalkescientific.com wrote: [Short version: It doesn't look like my proposal or any simple alternative is tenable.] On Aug 10, 2013, at 10:28 AM, Ralf Gommers wrote: It does break backwards compatibility though, because now you can do: import numpy as np np.testing.assert_equal(x, y) Yes, it does. I realize that a design goal in numpy was that (most?) submodules are available without any additional imports. This is the main reason for the import numpy overhead. The tension between ease-of-use for some and overhead for others is well known. For example, Sage tickets 3577, 6494, and 11714 relate to deferring numpy import during startup. The three relevant questions are: 1) is numpy.testing part of that promise? This can be split into multiple ways. o The design goal could be that only the numerics that people use for interactive/REPL computing are accessible without additional explicit imports, which implies that the import of numpy.testing is an implementation side-effect of providing submodule-level test() and bench() APIs o all NumPy modules with user-facing APIs should be accessible from numpy without additional imports While I would like to believe that the import of numpy.testing is an implementation side-effect of providing test() and bench(), I believe that I'm unlikely to convince the majority. It likely is a side-effect rather than intentional design, but at this point that doesn't matter much anymore. There never was a clear distinction between private and public modules and now, as your investigation shows, the cost of removing the import is quite high. For justifiable reasons, the numpy project is loath to break backwards compatibility, and I don't think there's an existing bright-line policy which would say that import numpy; numpy.testing should be avoided. 2) If it isn't a promise that numpy.testing is usable after an import numpy then how many people will be affected by an implementation change, and at what level of severity? I looked to see which packages might fail. A Debian code search of numpy.testing showed no problems, and no one uses np.testing. I did a code search at http://code.ohloh.net . Of the first 200 or so hits for numpy.testing, nearly all of them fell into uses like: from numpy.testing import Tester from numpy.testing import assert_equal, TestCase from numpy.testing.utils import * from numpy.testing import * There were, however, several packages which would fail: test_io.py and test_image.py and test_array_bridge.py in MediPy (Interestingly, test_circle.py has a import numpy.testing, so it's not universal practice in that package) calculators_test.py in OpenQuake Engine ForcePlatformsExtractorTest.py in b-tk Note that these failures are in the test programs, and not in the main body code, so are unlikely to break end-user programs. HOWEVER! The real test is for people who do import numpy as np then refer to np.testing. There are about 454 such matches in Ohloh. One example is 'test_polygon.py' from scikit-image. Others are: test_anova.py in statsmodel test_graph.py in scikit-learn test_rmagic.py in IPython test_mlab.py in matplotlib Nearly all the cases I looked at were in files starting test, or a handful which ended in test.py or Test.py. Others use np.test only as part of a unit test, such as: affine_grid.py and others in pyMor (as part of in-file unit tests) technical_indicators.py in QuantPy (as part of unittest.TestCase) coord_tools.py in NiPy-OLD (as part of in-file unit tests) predstd.py and others in statsmodels (as a main-line unit test) galsim_test_helpers.py in GalSim These would likely not break end-user code. Sadly, not all are that safe. For examples: simple_contrast.py example program for nippy try_signal_lti.py in joePython run.py in python-seminar verify.py in bell_d_project (a final project for a CS class) ex_shrink_pickle.py in statsmodels (as an example?) parametric_design.py in nippy (uses assert_almost_equal to verify an example) model.py in pymc-devs's pymc model.py in duffy zipline in olmar utils.py in MNE and I gave up at result 320 of 454. Based on this, about 1% of the programs which use numpy.testing would break. This tells me that there are enough user programs which would fail that I don't think numpy will decide to make this change. And the third question is 3) Are there other alternatives? Or as Ralf Gommers wrote: Do you have more detailed timings? I'm guessing the bottleneck is importing nose. I do have more detailed timings. nose is not imported during an import numpy. (For one, import nose takes a full 0.11 seconds on my laptop and adds 199 modules to sys.modules!) The hit is
Re: [Numpy-discussion] import overhead of numpy.testing
On Aug 11, 2013 5:02 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Sun, Aug 11, 2013 at 3:35 AM, Benjamin Root ben.r...@ou.edu wrote: Would there be some sort of way to detect that numpy.testing wasn't explicitly imported and issue a deprecation warning? Say, move the code into numpy._testing, import in into the namespace as testing, but then have the testing.py file set a flag in _testing to indicate an explicit import has occurred? Eventually, even _testing would no longer get imported by default and all will be well. Of course, that might be too convoluted? I'm not sure how that would work (you didn't describe how to decide that the import was explicit), but imho the impact would be too high. Ralf The idea would be that within numpy (and we should fix SciPy as well), we would always import numpy._testing as testing, and not import testing.py ourselves. Then, there would be a flag in _testing.py that would be set to emit, by default, warnings about using np.testing without an explicit import, and stating which version all code will have to be switched by perhaps 2.0?). testing.py would do a from _testing import *, but also set the flag in _testing to not emit warnings, because only a non-numpy (and SciPy) module would have imported it. It isn't foolproof. If a project has multiple dependencies that use np.testing, and only one of them explicitly imports np.testing, then the warning becomes hidden for the non-compliant parts. However, if we make sure that the core SciPy projects use np._testing, it would go a long way to get the word out. Again, I am just throwing it out there as an idea. The speedups we are getting right now so far are nice, so it is entirely possible that this kludge is just not worth the last remaining bits of extra time. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import overhead of numpy.testing
On Aug 11, 2013 4:37 PM, Andrew Dalke da...@dalkescientific.com wrote: On Aug 11, 2013, at 10:24 PM, Benjamin Root wrote: The idea would be that within numpy (and we should fix SciPy as well), we would always import numpy._testing as testing, and not import testing.py ourselves. The problem is the existing code out there which does: import numpy as np ... np.testing.utils.assert_almost_equal(x, y) (That is, without an additional import), and other code which does from numpy.testing import * I wouldn't consider having then both emit a warning. The latter one is an explicit import (albeit horrible). Iirc, that should import the testing.py, and deactivate the warnings. However, from numpy import testing would be a problem... Drat... Forget I said anything. The idea wouldn't work. Ben ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] RAM problem during code execution - Numpya arrays
On Fri, Aug 23, 2013 at 10:34 AM, Francesc Alted franc...@continuum.iowrote: Hi José, The code is somewhat longish for a pure visual inspection, but my advice is that you install memory profiler ( https://pypi.python.org/pypi/memory_profiler). This will help you determine which line or lines are hugging the memory the most. Saludos, Francesc On Fri, Aug 23, 2013 at 3:58 PM, Josè Luis Mietta joseluismie...@yahoo.com.ar wrote: Hi ecperts. I need your help with a RAM porblem during execution of my script. I wrote the next code. I use SAGE. In 1-2 hours of execution time the RAM of my laptop (8gb) is filled and the sistem crash: from scipy.stats import uniformimport numpy as np cant_de_cadenas =[700,800,900] cantidad_de_cadenas=np.array([]) for k in cant_de_cadenas: cantidad_de_cadenas=np.append(cantidad_de_cadenas,k) cantidad_de_cadenas=np.transpose(cantidad_de_cadenas) b=10 h=bLongitud=1 numero_experimentos=150 densidad_de_cadenas =cantidad_de_cadenas/(b**2) prob_perc=np.array([]) tiempos=np.array([]) S_int=np.array([]) S_medio=np.array([]) desviacion_standard=np.array([]) desviacion_standard_nuevo=np.array([]) anisotropia_macroscopica_porcentual=np.array([]) componente_y=np.array([]) componente_x=np.array([]) import time for N in cant_de_cadenas: empieza=time.clock() PERCOLACION=np.array([]) size_medio_intuitivo = np.array([]) size_medio_nuevo = np.array([]) std_dev_size_medio_intuitivo = np.array([]) std_dev_size_medio_nuevo = np.array([]) comp_y = np.array([]) comp_x = np.array([]) for u in xrange(numero_experimentos): perco = False array_x1=uniform.rvs(loc=-b/2, scale=b, size=N) array_y1=uniform.rvs(loc=-h/2, scale=h, size=N) array_angle=uniform.rvs(loc=-0.5*(np.pi), scale=np.pi, size=N) array_pendiente_x=1./np.tan(array_angle) random=uniform.rvs(loc=-1, scale=2, size=N) lambda_sign=np.zeros([N]) for t in xrange(N): if random[t]0: lambda_sign[t]=-1 else: lambda_sign[t]=1 array_lambdas=(lambda_sign*Longitud)/np.sqrt(1+array_pendiente_x**2) array_x2= array_x1 + array_lambdas*array_pendiente_x array_y2= array_y1 + array_lambdas*1 array_x1 = np.append(array_x1, [-b/2, b/2, -b/2, -b/2]) array_y1 = np.append(array_y1, [-h/2, -h/2, -h/2, h/2]) array_x2 = np.append(array_x2, [-b/2, b/2, b/2, b/2]) array_y2 = np.append(array_y2, [h/2, h/2, -h/2, h/2]) M = np.zeros([N+4,N+4]) for j in xrange(N+4): if j0: x_A1B1 = array_x2[j]-array_x1[j] y_A1B1 = array_y2[j]-array_y1[j] x_A1A2 = array_x1[0:j]-array_x1[j] y_A1A2 = array_y1[0:j]-array_y1[j] x_A2A1 = -1*x_A1A2 y_A2A1 = -1*y_A1A2 x_A2B2 = array_x2[0:j]-array_x1[0:j] y_A2B2 = array_y2[0:j]-array_y1[0:j] x_A1B2 = array_x2[0:j]-array_x1[j] y_A1B2 = array_y2[0:j]-array_y1[j] x_A2B1 = array_x2[j]-array_x1[0:j] y_A2B1 = array_y2[j]-array_y1[0:j] p1 = x_A1B1*y_A1A2 - y_A1B1*x_A1A2 p2 = x_A1B1*y_A1B2 - y_A1B1*x_A1B2 p3 = x_A2B2*y_A2B1 - y_A2B2*x_A2B1 p4 = x_A2B2*y_A2A1 - y_A2B2*x_A2A1 condicion_1=p1*p2 condicion_2=p3*p4 for k in xrange (j): if condicion_1[k]=0 and condicion_2[k]=0: M[j,k]=1 del condicion_1 del condicion_2 if j+1N+4: x_A1B1 = array_x2[j]-array_x1[j] y_A1B1 = array_y2[j]-array_y1[j] x_A1A2 = array_x1[j+1:]-array_x1[j] y_A1A2 = array_y1[j+1:]-array_y1[j] x_A2A1 = -1*x_A1A2 y_A2A1 = -1*y_A1A2 x_A2B2 = array_x2[j+1:]-array_x1[j+1:] y_A2B2 = array_y2[j+1:]-array_y1[j+1:] x_A1B2 = array_x2[j+1:]-array_x1[j] y_A1B2 = array_y2[j+1:]-array_y1[j] x_A2B1 = array_x2[j]-array_x1[j+1:] y_A2B1 = array_y2[j]-array_y1[j+1:] p1 = x_A1B1*y_A1A2 - y_A1B1*x_A1A2 p2 = x_A1B1*y_A1B2 - y_A1B1*x_A1B2 p3 = x_A2B2*y_A2B1 - y_A2B2*x_A2B1 p4 = x_A2B2*y_A2A1 - y_A2B2*x_A2A1 condicion_1=p1*p2 condicion_2=p3*p4 for k in xrange ((N+4)-j-1): if condicion_1[k]=0 and condicion_2[k]=0: M[j,k+j+1]=1 del condicion_1 del condicion_2 M[N,N+2]=0 M[N,N+3]=0 M[N+1,N+2]=0 M[N+1,N+3]=0 M[N+2,N]=0
Re: [Numpy-discussion] 1.8.0 branch reminder
On Mon, Aug 26, 2013 at 11:01 AM, Ralf Gommers ralf.gomm...@gmail.comwrote: On Sun, Aug 18, 2013 at 6:36 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Aug 18, 2013 at 12:17 PM, Charles R Harris charlesr.har...@gmail.com wrote: Just a reminder that 1.8.0 will be branched tonight. I've put up a big STY: PR https://github.com/numpy/numpy/pull/3635 that removes trailing whitespace and fixes spacing after commas. I would like to apply before the branch, but it may cause merge difficulties down the line. I'd like feedback on that option. I've also run autopep8 on the code and it does a nice job of cleaning up things. It gets a little lost in deeply nested lists, but there aren't too many of those. By default it doesn't fix spaces about operator (it seems). I can apply that also if there is interest in doing so. Depends on how many lines of code it touches. For scipy we decided not to do this, because it would make git blame pretty much useless. Ralf At some point, you just have to bite the bullet. Matplotlib has been doing pep8 work for about a year now. We adopted very specific rules on how that work was to be done (make pep8 only PRs, each pep8 PR would be for at most one module at a time, etc). Yes, it does look like NelleV has taken over the project, but the trade-off is readability. We even discovered a non-trivial number of bugs this way. For a core library like NumPy that has lots of very obscure-looking code that almost never gets changed, avoiding PEP8 is problematic because it always becomes Somebody else's problem. Of course, it is entirely up to you, the devs, on what to do for NumPy and SciPy, but that is what matplotlib is doing. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array addition inconsistency
On Thu, Aug 29, 2013 at 8:04 AM, Robert Kern robert.k...@gmail.com wrote: On Thu, Aug 29, 2013 at 12:00 PM, Martin Luethi lue...@vaw.baug.ethz.ch wrote: Dear all, After some surprise, I noticed an inconsistency while adding array slices: a = np.arange(5) a[1:] = a[1:] + a[:-1] a array([0, 1, 3, 5, 7]) versus inplace a = np.arange(5) a[1:] += a[:-1] a array([ 0, 1, 3, 6, 10]) My suspicition is that the second variant does not create intermediate storage, and thus works on the intermediate result, effectively performing a.cumsum(). Correct. Not creating intermediate storage is the point of using augmented assignment. This can be very sneaky. a = np.arange(5) a[:-1] = a[:-1] + a[1:] a array([1, 3, 5, 7, 4]) a = np.arange(5) a[:-1] += a[1:] a array([1, 3, 5, 7, 4]) So, if someone is semi-careful and tries out that example, they might (incorrectly) assume that such operations are safe without realizing that it was safe because the values of a[1:] were ahead of the values of a[:-1] in memory. I could easily imagine a situation where views of an array are passed around only to finally end up in an in-place operation like this and sometimes be right and sometimes be wrong. Maybe there can be some simple check that could be performed to detect this sort of situation? Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
On Aug 29, 2013 4:11 PM, Jonathan T. Niehof jnie...@lanl.gov wrote: On 08/29/2013 01:48 PM, Ralf Gommers wrote: Thanks. I had read that quite differently, and I'm sure I'm not the only one. Some context would have helped My apologies--that was a rather obtuse reference. Just for future reference, the language and the community is full of references like these. IDLE, is named for Eric Idle, one of the members of Monty Python, while Guido's title of BDFL is a reference to a sketch. But I am sure you'd never expected that... :-p Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug (?) converting list to array
The two lists are of different sizes. Had to count twice to catch that. Ben Root On Mon, Sep 9, 2013 at 9:46 AM, Chad Kidder cckid...@gmail.com wrote: I'm trying to enter a 2-D array and np.array() is returning a 1-D array of lists. I'm using Python (x,y) on Windows 7 with numpy 1.7.1. Here's the code that is giving me issues. f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] f1a = np.array(f1) f1a array([[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]], dtype=object) What am I missing? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] PEP8
On Sat, Sep 7, 2013 at 7:56 PM, Charles R Harris charlesr.har...@gmail.comwrote: Hi All, I've been doing some PEP8 work using autopep8. One problem that has turned up is that the default behavior of autopep8 is version dependent. I'd like to put a script in numpy tools that runs autopep8 with some features disabled, namely 1. E226 -- puts spaces around arithmetic operators (+, -, *, /, **). 2. E241 -- allows only single spaces after ',' Something we have done in matplotlib is that we have made PEP8 a part of the tests. We are transitioning, but the idea is that eventually, with Travis, all pull requests will get PEP8 checked. I am very leary of automatic PEP8-ing. I would rather have the tests fail and let me manually fix it rather than have code automatically changed. The first leaves expression formatting in the hands of the coder and avoids things like 2 ** 3. The second allows array entries to be vertically aligned, which can be useful in clarifying the values used in tests. A few other things that might need decisions: 1. [:,:, 2] or [:, :, 2] 2. Blank line before first function after class Foo(): For the first one, I prefer spaces. For the second one, I prefer no blank lines. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] can argmax be used to return row and column indices?
On Fri, Sep 13, 2013 at 4:27 AM, Mark Bakker mark...@gmail.com wrote: Thanks, Gregorio. I would like it if argmax had a keyword option to return the row,column index automatically (or whatever the dimension of the array). Afterall, argmax already knows the shape of the array. Calling np.unravel_index( np.argmax( A ) ) seems unnecessarily long. But it works well though! I am not sure that such a PR would get much support Thanks again, Mark What should it do when np.argmax() gets axis=1 argument? I see confusion occurring with parsing the returned results for arbitrary dimension inputs. -1 on any such PR. +1 on making sure all arg*() functions have unravel_index() very prominent in their documentation (which it does right now for argmax()). Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Indexing changes/deprecations
On Fri, Sep 27, 2013 at 8:27 AM, Sebastian Berg sebast...@sipsolutions.netwrote: Hey, since I am working on the indexing. I was wondering about a few smaller things: * 0-d boolean array, `np.array(0)[True]` (will work now) would give np.array([0]) as a copy, instead of the original array. I guess I could add a FutureWarning or so, but I am not sure and overall the chance of creating bugs seems low. (The boolean index should always add 1 dimension and here, remove 0 dimensions - 1-d result.) * All index operations return a view; never the object. This means that `v = arr[...]` is slightly slower. But since it does not affect `arr[...] = vals`, I think the speed implications are negligible. * Does anyone have an idea if there is a way to change the subclass logic that view based item setting is implemented as: np.asarray(subclass[index]) = vals I somewhat think the subclass should rather implement `__setitem__` instead of relying on numpy calling its `__getitem__`, but I don't see how it can be changed. * Still thinking a bit about implementing a keepdims keyword or function, to handle matrix type logic mostly in the C-code. And most importantly, is there any behaviour thing in the index machinery that is bugging you, which I may have forgotten until now? - Sebastian Boolean indexing could use a facelift. First, consider the following (albeit minor) annoyance: import numpy as np a = np.arange(5) a[[True, False, True, False, True]] array([1, 0, 1, 0, 1]) b = np.array([True, False, True, False, True]) a[b] array([0, 2, 4]) Next, it would be nice if boolean indexing returned a view (wishful thinking, I know): c = a[b] c array([0, 2, 4]) c[1] = 7 c array([0, 7, 4]) a array([0, 1, 2, 3, 4]) Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-Dev] 1.8.0rc1
On Wed, Oct 2, 2013 at 11:43 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Stefan, On Wed, Oct 2, 2013 at 9:29 AM, Stéfan van der Walt ste...@sun.ac.zawrote: Hi Chuck On Tue, Oct 1, 2013 at 1:07 AM, Charles R Harris charlesr.har...@gmail.com wrote: I'll bet the skimage problems come from https://github.com/numpy/numpy/pull/3811. They may be doing something naughty... Reverting that commit fixes those skimage failures. However, there are a number of python2.7 failures that look pretty strange. What is the exact change in behavior with that PR? I'm trying to figure out what skimage does wrong in this case. The current master, and reverted for the 1.8 release only, is stricter about np.bool only taking values 0 or 1. Apparently the convolve returns boolean (I haven't checked) for boolean input, and consequently the check if the return value matches the number of 1 elements in the convolution kernel will fail when that number is greater than one. That is why the proposed fix is to view the boolean as uint8 instead. Note that out=(boolean) will still cause problems. Chuck So, just to be clear... what would happen if I had an array of floats between 0 and 1 inclusive and I cast that as a boolean using astype()? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of nan{max, min} and nanarg{max, min} for all-nan slices.
On Wed, Oct 2, 2013 at 1:05 PM, Charles R Harris charlesr.har...@gmail.comwrote: On Wed, Oct 2, 2013 at 10:56 AM, josef.p...@gmail.com wrote: On Wed, Oct 2, 2013 at 12:37 PM, Stéfan van der Walt ste...@sun.ac.za wrote: On 2 Oct 2013 18:04, Charles R Harris charlesr.har...@gmail.com wrote: The question is what to do when all-nan slices are encountered in the nan{max,min} and nanarg{max, min} functions. Currently in 1.8.0, the first returns nan and raises a warning, the second returns intp.min and raises a warning. It is proposed that the nanarg{max, min} functions, and possibly the nan{max, min} also, raise an error instead. I agree with Nathan; this sounds like more reasonable behaviour to me. If I understand what you are proposing -1 on raising an error with nan{max, min}, an empty array is empty in all columns an array with nans, might be empty in only some columns. as far as I understand, nan{max, min} only make sense with arrays that can hold a nan, so we can return nans. That was my original thought. If a user calls with ints or bool, then there are either no nans or the array is empty, and I don't care. --- aside with nanarg{max, min} I would just return 0 in an all nan column, since the max or min is nan, and one is at zero. (but I'm not arguing) That is an interesting proposal. I like it. Chuck And it is logically consistent, I think. a[nanargmax(a)] == nanmax(a) (ignoring the silly detail that you can't do an equality on nans). Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion