Re: [Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE
On 15 Feb 2016, at 6:55 pm, Jeff Rebackwrote: > > https://github.com/pydata/pandas/releases/tag/v0.18.0rc1 Ah, think I forgot about the ‘releases’ pages. Built on OS X 10.10 + 10.11 with python 2.7.11, 3.4.4 and 3.5.1. 17 errors in the test suite + 1 failure with python2.7 only; I can send you details on the errors if desired, the majority seems to be generic to a urllib problem with openssl on OS X anyway. Thanks for the good work Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE
On 14 Feb 2016, at 1:53 am, Jeff Rebackwrote: > > I'm pleased to announce the availability of the first release candidate of > Pandas 0.18.0. > Please try this RC and report any issues here: Pandas Issues > We will be releasing officially in 1-2 weeks or so. > Thanks, looking forward to give this a try! Do you have a download link to the source for non-Conda users and packagers? Finding anything in the github source tarball repositories without having the exact path seems hopeless. Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy 1.11.0b1 is out
> On 31 Jan 2016, at 9:48 am, Sebastian Berg <sebast...@sipsolutions.net> wrote: > > On Sa, 2016-01-30 at 20:27 +0100, Derek Homeier wrote: >> On 27 Jan 2016, at 1:10 pm, Sebastian Berg < >> sebast...@sipsolutions.net> wrote: >>> >>> On Mi, 2016-01-27 at 11:19 +, Nadav Horesh wrote: >>>> Why the dot function/method is slower than @ on python 3.5.1? >>>> Tested >>>> from the latest 1.11 maintenance branch. >>>> >>> >>> The explanation I think is that you do not have a blas >>> optimization. In >>> which case the fallback mode is probably faster in the @ case >>> (since it >>> has SSE2 optimization by using einsum, while np.dot does not do >>> that). >> >> I am a bit confused now, as A @ c is just short for A.__matmul__(c) >> or equivalent >> to np.matmul(A,c), so why would these not use the optimised blas? >> Also, I am getting almost identical results on my Mac, yet I thought >> numpy would >> by default build against the VecLib optimised BLAS. If I build >> explicitly against >> ATLAS, I am actually seeing slightly slower results. >> But I also saw these kind of warnings on the first timeit runs: >> >> %timeit A.dot(c) >> The slowest run took 6.91 times longer than the fastest. This could >> mean that an intermediate result is being cached >> >> and when testing much larger arrays, the discrepancy between matmul >> and dot rather >> increases, so perhaps this is more an issue of a less memory >> -efficient implementation >> in np.dot? > > Sorry, I missed the fact that one of the arrays was 3D. In that case I > am not even sure which if the functions call into blas or what else > they have to do, would have to check. Note that `np.dot` uses a > different type of combinging high dimensional arrays. @/matmul > broadcasts extra axes, while np.dot will do the outer combination of > them, so that the result is: > > As = A.shape > As.pop(-1) > cs = c.shape > cs.pop(-2) # if possible > result_shape = As + cs > > which happens to be identical if only A.ndim > 2 and c.ndim <= 2. Makes sense now; with A.ndim = 2 both operations take about the same time (and are ~50% faster with VecLib than with ATLAS) and yield identical results, while any additional dimension in A adds more overhead time to np.dot, and the results are np.allclose, but not exactly identical. Thanks, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy 1.11.0b1 is out
On 27 Jan 2016, at 1:10 pm, Sebastian Bergwrote: > > On Mi, 2016-01-27 at 11:19 +, Nadav Horesh wrote: >> Why the dot function/method is slower than @ on python 3.5.1? Tested >> from the latest 1.11 maintenance branch. >> > > The explanation I think is that you do not have a blas optimization. In > which case the fallback mode is probably faster in the @ case (since it > has SSE2 optimization by using einsum, while np.dot does not do that). I am a bit confused now, as A @ c is just short for A.__matmul__(c) or equivalent to np.matmul(A,c), so why would these not use the optimised blas? Also, I am getting almost identical results on my Mac, yet I thought numpy would by default build against the VecLib optimised BLAS. If I build explicitly against ATLAS, I am actually seeing slightly slower results. But I also saw these kind of warnings on the first timeit runs: %timeit A.dot(c) The slowest run took 6.91 times longer than the fastest. This could mean that an intermediate result is being cached and when testing much larger arrays, the discrepancy between matmul and dot rather increases, so perhaps this is more an issue of a less memory-efficient implementation in np.dot? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-Dev] Numpy 1.11.0b1 is out
On 27 Jan 2016, at 2:58 AM, Charles R Harriswrote: > > FWIW, the maintenance/1.11.x branch (there is no tag for the beta?) builds > and passes all tests with Python 2.7.11 > and 3.5.1 on Mac OS X 10.10. > > > You probably didn't fetch the tags, if they can't be reached from the branch > head they don't download automatically. Try `git fetch --tags upstream` Thanks, that did it. Successfully tested v1.11.0b1 on 10.11 and with Python 2.7.8 and 3.4.1 on openSUSE 13.2 as well. Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-Dev] Numpy 1.11.0b1 is out
Hi Chuck, > I'm pleased to announce that Numpy 1.11.0b1 is now available on sourceforge. > This is a source release as the mingw32 toolchain is broken. Please test it > out and report any errors that you discover. Hopefully we can do better with > 1.11.0 than we did with 1.10.0 ;) the tarball seems to be incomplete, hope that does not bode ill ;-) adding 'build/src.macosx-10.10-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_numpy_api.py error: [Errno 2] No such file or directory: 'numpy/core/code_generators/../src/multiarray/arraytypes.c.src' > tar tvf /sw/src/numpy-1.11.0b1.tar.gz |grep arraytypes > -rw-rw-r-- charris/charris 62563 2016-01-21 20:38 numpy-1.11.0b1/numpy/core/include/numpy/ndarraytypes.h -rw-rw-r-- charris/charris 981 2016-01-21 20:38 numpy-1.11.0b1/numpy/core/src/multiarray/arraytypes.h FWIW, the maintenance/1.11.x branch (there is no tag for the beta?) builds and passes all tests with Python 2.7.11 and 3.5.1 on Mac OS X 10.10. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB
On 16 Dec 2015, at 8:22 PM, Matthew Brettwrote: > >>> In [4]: %time testx = np.linalg.solve(testA, testb) >>> CPU times: user 1min, sys: 468 ms, total: 1min 1s >>> Wall time: 15.3 s >>> >>> >>> so, it looks like you will need to buy a MKL license separately (which >>> makes sense for a commercial product). > > If you're on a recent Mac, I would guess that the default > Accelerate-linked numpy / scipy will be in the same performance range > as those linked to the MKL, but I am happy to be corrected. > Getting around 30 s wall time here on a not so recent 4-core iMac, so that would seem to fit (iirc Accelerate should actually largely be using the same machine code as MKL). Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate fromstring() for text reading?
On 3 Nov 2015, at 6:03 pm, Chris Barker - NOAA Federalwrote: > > I was more aiming to point out a situation where the NumPy's text file reader > was significantly better than the Pandas version, so we would want to make > sure that we properly benchmark any significant changes to NumPy's text > reading code. Who knows where else NumPy beats Pandas? > Indeed. For this example, I think a fixed-with reader really is a different > animal, and it's probably a good idea to have a high performance one in > Numpy. Among other things, you wouldn't want it to try to auto-determine data > types or anything like that. > > I think what's on the table now is to bring in a new delimited reader -- I.e. > CSV in its various flavors. > To add my own handful of change or at least another data point, I had been looking into both the pandas and the Astropy fast readers as a fast loadtxt/genfromtxt replacement; at the time I found the Astropy cparser source somewhat easier to dig into, although looking now Pandas' parser.pyx seems clear enough as well. Some comparison of the two can be found at http://astropy.readthedocs.org/en/stable/io/ascii/fast_ascii_io.html#speed-gains Unfortunately the Astropy fast reader currently does not support fixed-width format either, and adding this functionality would require modifications to the tokenizer C code - not sure how extensive. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: HDF5 for Python 2.5.0
On 9 Apr 2015, at 9:41 pm, Andrew Collette andrew.colle...@gmail.com wrote: Congrats! Also btw, you might want to switch to a new subject line format for these emails -- the mention of Python 2.5 getting hdf5 support made me do a serious double take before I figured out what was going on, and 2.6 and 2.7 will be even worse :-) Ha! Didn't even think of that. For our next release I guess we'll have to go straight to h5py 3.5. You may have to hurry though ;-) Monday, March 30, 2015 Python 3.5.0a3 has been released. This is the third alpha release of Python 3.5, which will be the next major release of Python. Python 3.5 is still under heavy development and is far from complete.” 3 alpha releases in 7 weeks… On a more serious note though, h5py 2.5.x in the subject would be perfectly clear enough, I think, and also help to distinguish from pytables releases. Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt
On 26 Oct 2014, at 02:21 pm, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Im not sure why the memory doubling is necessary. Isnt it possible to preallocate the arrays and write to them? I suppose this might be inefficient though, in case you end up reading only a small subset of rows out of a mostly corrupt file? But that seems to be a rather uncommon corner case. Either way, id say a doubling of memory use is fair game for numpy. Generality is more important than absolute performance. The most important thing is that temporary python datastructures are avoided. That shouldn't be too hard to accomplish, and would realize most of the performance and memory gains, I imagine. Preallocation is not straightforward because the parser needs to be able in general to work with streamed input. I think I even still have a branch on github bypassing this on request (by keyword argument). But a factor 2 is already a huge improvement over that factor ~6 coming from the current text readers buffering the entire input as list of list of Python strings, not to speak of the vast performance gain from using a parser implemented in C like pandas’ - in fact one of the last times this subject came up one suggestion was to steal pandas.read_cvs and adopt as required. Someone also posted some code or the draft thereof for using resizable arrays quite a while ago, which would reduce the memory overhead for very large arrays. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
On 4 Oct 2014, at 08:37 pm, Ariel Rokem aro...@gmail.com wrote: import numpy as np np.__version__ '1.9.0' np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) [array([[ 2., 2., -1.], [ 2., 2., -1.]]), array([[-0.5, 2.5, 5.5], [ 1. , 1. , 1. ]])] On the other hand: import numpy as np np.__version__ '1.8.2' np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) [array([[ 2., 2., -1.], [ 2., 2., -1.]]), array([[ 1. , 2.5, 4. ], [ 1. , 1. , 1. ]])] For what it's worth, the 1.8 version of this function seems to be in agreement with the Matlab equivalent function ('gradient'): gradient([[1, 2, 6]; [3, 4, 5]]) ans = 1.2.50004. 1.1.1. This seems like a regression to me, but maybe it's an improvement? Technically yes, the function has been changed to use 2nd-order differences where possible, as is described in the docstring. Someone missed to update the example though, which still quotes the 1.8 results. And if the loss of Matlab-compliance is seen as a disadvantage, maybe there is a case for re-enabling the old behaviour via keyword argument? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
Hi Ariel, I think that the docstring in 1.9 is fine (has the 1.9 result). The docs online (for all of numpy) are still on version 1.8, though. I think that enabling the old behavior might be useful, if only so that I can write code that behaves consistently across these two versions of numpy. For now, I might just copy over the 1.8 code into my project. Hmm, I got this with 1.9.0: Examples x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float) np.gradient(x) array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ]) np.gradient(x, 2) array([ 0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ]) np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) [array([[ 2., 2., -1.], [ 2., 2., -1.]]), array([[ 1. , 2.5, 4. ], [ 1. , 1. , 1. ]])] In [5]: x =np.array([1, 2, 4, 7, 11, 16], dtype=np.float) In [6]: print(np.gradient(x)) [ 0.5 1.5 2.5 3.5 4.5 5.5] In [7]: print(np.gradient(x, 2)) [ 0.25 0.75 1.25 1.75 2.25 2.75] … I think there is a point for supporting the old behaviour besides backwards-compatibility or any sort of Matlab-compliance as I’d probably like to be able to restrict a function to linear/1st order differences in cases where I know the input to be not well-behaved. +1 for an order=2 or maxorder=2 flag Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
On 26 Aug 2014, at 09:05 pm, Adrian Altenhoff adrian.altenh...@inf.ethz.ch wrote: But you are right that the problem with using the first_values, which should of course be valid, somehow stems from the use of usecols, it seems that in that loop for (i, conv) in user_converters.items(): i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of modifying i inside the loop appears a bit dangerous to me. I’ll have look if I can make this safer. Thanks. As long as your data don’t actually contain any missing values you might also simply use np.loadtxt. Ok, wasn't aware of that function so far. I will try that! It was first_values that needs to be addressed by the original indices. I have created a short test from your case and submitted a fix at https://github.com/numpy/numpy/pull/5006 Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
Hi Adrian, I tried to load data from a csv file into numpy using genfromtxt. I need only a subset of the columns and want to apply some conversions to the data. attached is a minimal script showing the error. In brief, I want to load columns 1,2 and 4. But in the converter function for the 4th column, I get the 3rd value. The issue does not occur if I also load the 3rd column. Did I somehow misunderstand how the function is supposed to work or is this indeed a bug? not sure whether to call it a bug; the error seems to arise before reading any actual data (even on reading from an empty string); when genfromtxt is checking the filling_values used to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 which your conversion scheme does not know about. Although I think it is rather the user’s responsibility to provide valid converters, probably the documentation should at least be updated to make them aware of this requirement. I see two possible fixes/workarounds: provide an keyword argument filling_values=[0,0,'1:1’] or add the default filling values to your relEnum dictionary, e.g. { … '-1':-1, '1':-1} Could you check if this works for your case? HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
Hi Adrian, not sure whether to call it a bug; the error seems to arise before reading any actual data (even on reading from an empty string); when genfromtxt is checking the filling_values used to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 which your conversion scheme does not know about. Although I think it is rather the user’s responsibility to provide valid converters, probably the documentation should at least be updated to make them aware of this requirement. I see two possible fixes/workarounds: provide an keyword argument filling_values=[0,0,'1:1’] This workaround seems to be work, but I doubt that the actual problem is the converter function I pass. The '-1', which is used as the testing value is the first_values from the 3rd column (line 1574 in npyio.py), but the converter is defined for column 4. by setting the filling_values to an array of length 3, this obviously makes the problem disappear. But I think if the first row is used, it should also use the values from the column for which the converter is defined. it is certainly related to the converter function because a KeyError for the dictionary you provide is raised: File test.py, line 13, in module 3: lambda rel: relEnum[rel.decode()]}) File /sw/lib/python3.4/site-packages/numpy/lib/npyio.py, line 1581, in genfromtxt missing_values=missing_values[i],) File /sw/lib/python3.4/site-packages/numpy/lib/_iotools.py, line 784, in update tester = func(testing_value or asbytes('1')) File test.py, line 13, in lambda 3: lambda rel: relEnum[rel.decode()]}) KeyError: '-1’ But you are right that the problem with using the first_values, which should of course be valid, somehow stems from the use of usecols, it seems that in that loop for (i, conv) in user_converters.items(): i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of modifying i inside the loop appears a bit dangerous to me. I’ll have look if I can make this safer. As long as your data don’t actually contain any missing values you might also simply use np.loadtxt. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate
On 5 Aug 2014, at 11:27 pm, Matthew Brett matthew.br...@gmail.com wrote: OSX wheels built and tested and uploaded OK : http://wheels.scikit-image.org https://travis-ci.org/matthew-brett/numpy-atlas-binaries/builds/31747958 Will test against the scipy stack later on today. Built and tested against the Fink Python installation under OSX. Seems to resolve one of a couple of f2py test errors appearing with 1.8.1 on Python 3.3 and 3.4: == ERROR: test_return_real.TestCReturnReal.test_all -- Traceback (most recent call last): File /sw/lib/python3.4/site-packages/nose/case.py, line 382, in setUp try_run(self.inst, ('setup', 'setUp')) File /sw/lib/python3.4/site-packages/nose/util.py, line 470, in try_run return func() File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 348, in setUp module_name=self.module_name) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in wrapper memo[key] = func(*a, **kw) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 163, in build_code module_name=module_name) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in wrapper memo[key] = func(*a, **kw) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 144, in build_module __import__(module_name) ImportError: No module named ‘c_ext_return_real' is gone on 3.4 now but still present on 3.3. Two errors of this kind (with different numbers) remain: ERROR: test_return_real.TestF90ReturnReal.test_all -- Traceback (most recent call last): File /sw/lib/python3.4/site-packages/nose/case.py, line 382, in setUp try_run(self.inst, ('setup', 'setUp')) File /sw/lib/python3.4/site-packages/nose/util.py, line 470, in try_run return func() File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 348, in setUp module_name=self.module_name) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in wrapper memo[key] = func(*a, **kw) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 163, in build_code module_name=module_name) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in wrapper memo[key] = func(*a, **kw) File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 144, in build_module __import__(module_name) ImportError: No module named ‘_test_ext_module_5415' NumPy version 1.8.2rc1 NumPy is installed in /sw/lib/python3.4/site-packages/numpy Python version 3.4.1 (default, Aug 3 2014, 21:02:44) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] nose version 1.3.3 Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] length - sticks algorithm
On 29 Jul 2014, at 02:43 pm, Robert Kern robert.k...@gmail.com wrote: On Tue, Jul 29, 2014 at 12:47 PM, Josè Luis Mietta joseluismie...@yahoo.com.ar wrote: Robert, thanks for your help! Now I have: * Q nodes (Q stick-stick intersections) * a list 'NODES'=[(x,y,i,j)_1,, (x,y,i,j)_Q], where each element (x,y,i,j) represent the intersection point (x,y) of the sticks i and j. * a matrix 'H' with Q elements {H_k,l}. H_k,l=0 if nodes 'k' and 'l' aren't joined by a edge, and H_k,l = R_k,l = the electrical resistance associated withthe union of the nodes 'k' and 'l' (directly proportional to the length of the edge that connects these nodes). * a list 'nodes_resistances'=[R_1, ., R_Q]. All nodes with 'j' (or 'i') = N+1 have a electric potential 'V' respect all nodes with 'j' or 'i' = N. Now i must apply NODAL ANALYSIS for determinate the electrical current through each of the edges, and the net current (see attached files). I have no ideas about how to do that. Can you help me? Please do not send largish binary attachments to this list. I do not know off-hand how to do this, but it looks like the EE201 document you attached tells you how. It is somewhat beyond the scope of this mailing list to help you understand that document, sorry. And it is not a good idea to post copyrighted journal articles to a list where they will end up in a public list archive (even if not immediately recognisable so). Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] problems with mailing list ?
On 18 Jul 2014, at 01:07 pm, josef.p...@gmail.com wrote: Are the problems with sending out the messages with the mailing lists? I'm getting some replies without original messages, and in some threads I don't get replies, missing part of the discussions. There seem to be problems with the Scipy list server; my last mails to astr...@scipy.org have taken 12-18 hours before they made it to the list, and some people here reported messages staying in the void for several days. But I think it’s been reported to Enthought already. Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] genfromtxt universal newline support
Hi all, I was just having a new look into the mess that is, imo, the support for automatic line ending recognition in genfromtxt, and more generally, the Python file openers. I am glad at least reading gzip files is no longer entirely broken in Python3, but actually detecting in particular “old Mac” style CR line endings currently only work for uncompressed and bzip2 files under 2.6/2.7. This is largely because genfromtxt wants to open everything in binary mode, which arguably makes no sense for ASCII text files with numbers. I think the only reason this works in 2.x is that the ‘U’ reading mode overrides the ‘b’. So on the Python side what actually works for automatic line ending detection is: Python 2.6 2.7 3.2 3.3/3.4 uncompressed: U U t t gzip: E N E t bzip2: U U E t* lzma: - - - t* U - works with mode ‘rU’ E - mode ‘rU’ raises an error N - mode ‘rU’ is accepted, but does not detect CR (‘\r’) line endings (actually I think ‘U’ is simply internally discarded by gzip.open() in 2.7.4+) t - works with mode ‘rt’ (default with plain open()) - * means requires the '.open()' rather than the '.XXXFile()' method of bz2/lzma Therefore I’d propose the changes in https://github.com/dhomeier/numpy/commit/995ec93 to extend universal newline recognition as far as possible with the above openers. There are some potential issues with this: 1. Switching to ‘rt’ mode for Python3.x means that np.lib._datasource.open() does not return byte strings by itself, so genfromtxt has to use asbytes() on the returned lines. Since this occurs only in two places, I don’t see a major problem with this. 2. In the tests I had to work around the lack of fileobj support in bz2.BZ2File by using os.system(‘bzip2 …’) on the temporary file, which might not work on all systems. In particular I’d expect it to fail under Windows, but it’s not clear to me how far the entire mkstemp thing works under Windows... As a final note, http://bugs.python.org/issue13989#msg153127 suggests a workaround that might make this work with gzip.open() (and perhaps bz2?) on 3.2 as well. I am not sure how high 3.2 support is ranking for the near future; for the moment I am not strongly inclined to implement it… Grateful for comments or tests (especially under Windows!) of the commit(s) above - Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt universal newline support
On 30 Jun 2014, at 04:39 pm, Nathaniel Smith n...@pobox.com wrote: On Mon, Jun 30, 2014 at 12:33 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: genfromtxt and loadtxt need an almost full rewrite to fix the botched python3 conversion of these functions. There are a couple threads about this on this list already. There are numerous PRs fixing stuff in these functions which I currently all -1'd because we need to fix the underlying unicode issues first. I have a PR were I started this for loadtxt but it is incredibly annoying to try to support all the broken use cases the function accidentally supported. 1.9 beta still uses the broken functions because I had no time to get this done correctly. But we should probably put a big fat future warning into the release notes that genfromtxt and loadtxt may stop working for your binary streams. What binary streams? That will probably allow us to start fixing these functions. +1 to doing the proper fix instead of piling up buggy hacks. Do we understand the difference between the current code and the proper code well enough to detect cases where they differ and issue warnings in those cases specifically? Does it make sense to keep maintaing both functions at all? IIRC the idea that loadtxt would be the faster version of the two has been discarded long ago, thus it seems there is very little, if anything, loadtxt can do that cannot be done just as well by genfromtxt. Main compatibility issue is probably different default behaviour and interface of the two, but perhaps that might be best solved by replacing loadtxt with another genfromtxt wrapper? A real need, which had also been discussed at length, is a truly performant text IO function (i.e. one using a compiled ASCII number parser, and optimally also a more memory-efficient one), but unfortunately all people interested in implementing this seem to have drifted away (not excluding myself from this)… Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt universal newline support
On 30 Jun 2014, at 04:56 pm, Nathaniel Smith n...@pobox.com wrote: A real need, which had also been discussed at length, is a truly performant text IO function (i.e. one using a compiled ASCII number parser, and optimally also a more memory-efficient one), but unfortunately all people interested in implementing this seem to have drifted away (not excluding myself from this)… It's possible we could steal some code from Pandas for this. IIRC they have C/Cython text parsing routines. (It's also an interesting question whether they've fixed the unicode/binary issues, might be worth checking before rewriting from scratch...) Good point, last time I was playing with Pandas it was not any faster, but now a 10x speedup speaks for itself. Their C engine does not support generic whitespace separators, but that could probably be addressed in a numpy implementation. Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt universal newline support
On 30.06.2014, at 23:10, Jeff Reback jeffreb...@gmail.com wrote: In pandas 0.14.0, generic whitespace IS parsed via the c-parser, e.g. specifying '\s+' as a separator. Not sure when you were playing last with pandas, but the c-parser has been in place since late 2012. (version 0.8.0) http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#text-parsing-api-changes Ah, I did not see the '\s' syntax in the documentation and thought ' *' would be the only option. Thanks, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.2rc1 release
On 13.11.2013, at 3:07AM, Charles R Harris charlesr.har...@gmail.com wrote: Python 2.4 fixes at https://github.com/numpy/numpy/pull/4049. Thanks for the fixes; builds under OS X 10.5 now as well. There are two test errors (or maybe as nose problem?): NumPy version 1.7.2rc1 NumPy is installed in /sw/src/fink.build/root-numpy-py24-1.7.2rc1-1/sw/lib/python2.4/site-packages/numpy Python version 2.4.4 (#1, Jan 5 2011, 03:05:41) [GCC 4.0.1 (Apple Inc. build 5493)] nose version 1.3.0 ... ERROR: Failure: SkipTest (Skipping test: test_special_values Numpy is using complex functions (e.g. sqrt) provided by yourplatform's C library. However, they do not seem to behave accordingto C99 -- so C99 tests are skipped.) -- Traceback (most recent call last): File /sw/lib/python2.4/site-packages/nose/failure.py, line 37, in runTest if isinstance(self.exc_val, BaseException): NameError: global name 'BaseException' is not defined == ERROR: Failure: SkipTest (Skipping test: test_special_values Numpy is using complex functions (e.g. sqrt) provided by yourplatform's C library. However, they do not seem to behave accordingto C99 -- so C99 tests are skipped.) -- Traceback (most recent call last): File /sw/lib/python2.4/site-packages/nose/failure.py, line 37, in runTest if isinstance(self.exc_val, BaseException): NameError: global name 'BaseException' is not defined Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.2rc1 release
Hi, On 03.11.2013, at 5:42PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: I'm happy to announce the release candidate of Numpy 1.7.2. This is a bugfix only release supporting Python 2.4 - 2.7 and 3.1 - 3.3. on OS X 10.5, build and tests succeed for Python 2.5-3.3, but Python 2.4.4 fails with /sw/bin/python2.4 setup.py build Running from numpy source directory. Traceback (most recent call last): File setup.py, line 214, in ? setup_package() File setup.py, line 191, in setup_package from numpy.distutils.core import setup File /sw/src/fink.build/numpy-py24-1.7.2rc1-1/numpy-1.7.2rc1/numpy/distutils/core.py, line 25, in ? from numpy.distutils.command import config, config_compiler, \ File /sw/src/fink.build/numpy-py24-1.7.2rc1-1/numpy-1.7.2rc1/numpy/distutils/command/build_ext.py, line 16, in ? from numpy.distutils.system_info import combine_paths File /sw/src/fink.build/numpy-py24-1.7.2rc1-1/numpy-1.7.2rc1/numpy/distutils/system_info.py, line 235 finally: ^ SyntaxError: invalid syntax Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Removal of numarray and oldnumeric packages.
On 23.09.2013, at 7:03PM, Charles R Harris charlesr.har...@gmail.com wrote: I have gotten no feedback on the removal of the numarray and oldnumeric packages. Consequently the removal will take place on 9/28. Scream now or never... The only thing I'd care about is the nd_image subpackage, but as far as I can see, that's already just a wrapper to import scipy.ndimage. I take it there are no pure numpy implementations for the likes of map_coordinates, right? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt and gzip
On 05.06.2013, at 9:52AM, Ted To rainexpec...@theo.to wrote: From the list archives (2011), I noticed that there is a bug in the python gzip module that causes genfromtxt to fail with python 2 but this bug is not a problem for python 3. When I tried to use genfromtxt and python 3 with a gzip'ed csv file, I instead got: IOError: Mode rbU not supported Is this a bug? I am using python 3.2.3 and numpy 1.7.1 from the experimental Debian repository. Interesting, it used to be the other way round indeed - at least Python3's gzip module was believed to work with 'U' mode (universal newline conversion). This was apparently fixed in Python 2.7.3: http://bugs.python.org/issue5148 but from the closing comment I'd take it should indeed _not_ be used in Python 3 The data corruption issue is now fixed in the 2.7 branch. In 3.x, using a mode containing 'U' results in an exception rather than silent data corruption. Additionally, gzip.open() has supported text modes (rt/wt/at) and newline translation since 3.3 Checking the various Python versions on OS X 10.8 I found: 2.6.8: fails similar to the older 2.x, i.e. gzip opens with 'rbU', but then fails upon reading (possibly randomly) with /sw/lib/python2.6/gzip.pyc in _read_eof(self) 302 if crc32 != self.crc: 303 raise IOError(CRC check failed %s != %s % (hex(crc32), -- 304 hex(self.crc))) 2.7.5: works as to be expected with the resolution of 5148 above. 3.1.5: works as well, which could just mean that the exception mentioned above has not made it into the 3.1.x branch… 3.2.5+3.3.2: gzip.open raises the exception as documented. This looks like the original issue, that the universal newline conversion should not be passed to gzip.open (where it is meaningless or even harmful) is still not resolved; ideally the 'U' flag should probably be caught in _datasource.py. I take it from the comments on issue 5148 that 3.3's gzip module offers alternative methods to do the newline conversion, but for 3.1, 3.2 and 2.6 this might still have to be done within either _datasource.py or genfromtxt; however I have no idea if anyone has come up with a good solution for this by now… Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] printing array in tabular form
On 10.05.2013, at 1:20PM, Sudheer Joseph sudheer.jos...@yahoo.com wrote: If some one has a quick way I would like to learn from them or get a referecence where the formatting part is described which was my intention while posting here. As I have been using fortran I just tried to use it to explain my requirement Admittedly the formatting options in Python can be confusing to beginners, precisely since they are much more powerful than for many other languages. As already pointed out, formats of the type '(5i5)' are very common to Fortran programs and thus readily supported by the language. np.savetxt is just a convenience function to support a number of similarly common output types, and it can create csv, tab-separated, or plenty of other outputs from a numpy array just out of the box. But you added to the confusion as you did not make it clear that you were not just requiring a plain csv file as your Fortran example would create (and the first version did not even have the commas); since this is a rather non-standard form you will just have to write a short loop yourself, wether you are using Fortran or Python. Infact the program which should read this file requires it in specified format which should look like IL = 1,2,3,4,5 1,2,3,4,5 1,2,3,4,5 The formats are all documented http://docs.python.org/2/library/string.html#format-specification-mini-language one important thing to know is that you can pretty much add (i.e. concatenate) them like strings: print((%6s+4*%d,+%d\n) % ((IL = ,)+tuple(IL[:5]))) or, perhaps a bit clearer: fmt = %6s+4*%d,+%d\n print_t = (IL = ,)+tuple(IL[:5]) print(fmt % print_t) The other important bit to keep in mind is that all arguments have to be passed as tuples. This should allow you to write a loop to print with a header or an empty header column for the subsequent lines as you see fit. Except for the string field which is explicitly formatted %s here, this is mostly equivalent to the example Henry just posted. HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] printing array in tabular form
On 10.05.2013, at 2:51PM, Daniele Nicolodi dani...@grinta.net wrote: If you wish to format numpy arrays preceding them with a variable name, the following is a possible solution that gives the same formatting as in your example: import numpy as np import sys def format(out, v, name): header = {} = .format(name) out.write(header) np.savetxt(out, v, fmt=%d, delimiter=, , newline=\n + * len(header)) out.write(\n) IL = np.array([range(5), ] * 5) format(sys.stdout, IL, IL) That is a quite ingenuous way to use savetxt functionality to write that extra column! Only two comments: Don't call that function format, as it would mask the 'format' builtin! In the present version it will only work with a file handle; to print it a to file you would need to pass it as fformat(open(fname, 'a'), … or check for that case inside the function. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] printing array in tabular form
Dear Sudheer, On 07.05.2013, at 11:14AM, Sudheer Joseph sudheer.jos...@yahoo.com wrote: I need to print few arrays in a tabular form for example below array IL has 25 elements, is there an easy way to print this as 5x5 comma separated table? in python IL=[] for i in np.arange(1,bno+1): IL.append(i) print(IL) assuming you want this table printed to a file, savetxt does just what you need. In brief for your case, np.savetxt(file.txt, IL.reshape(-1,5), fmt='%5d', delimiter=',') should print it in the requested form; you can refer to the save txt documentation for further options. HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Please stop bottom posting!!
On 12.04.2013, at 2:14AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Apr 11, 2013 at 5:49 PM, Colin J. Williams cjwilliam...@gmail.com wrote: On 11/04/2013 7:20 PM, Paul Hobson wrote: On Wed, Apr 3, 2013 at 4:28 PM, Doug Coleman doug.cole...@gmail.com wrote: Also, gmail bottom-posts by default. It's transparent to gmail users. I'd imagine they are some of the biggest offenders. Interesting. Mine go to the top by default and I always have to expand the quoted text, trim down as necessary, and then reply below the relevant bits. A quick gander at gmail's setting doesn't offer anything obvious. I'll dig deeper later. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Bottom posting seems to be the accepted Usenet standard. I don't care, can't someone can make a decision, so that we all do the same thing? Please develop a rationale or toss a coin and let us know. Numpy needs a BDFL (or a shorter term, if you wish). It's always been bottom posting. In German this kind of faux pas is usually labelled TOFU for text on top, full quote underneath, and I think it has been a bit overlooked so far that the full quote part probably is the bigger problem. IOW a call to try and trim the OP more rigourously should help a lot, and I'd think most people then can agree on bottom posting (and I know the issue with mail clients doing that automatically - the thread in question looks quite readable in Mountain Lion's Mail.app, but a nightmare on Snow Leopard!). Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Trouble building numpy on different version of OSX.
On 14.02.2013, at 3:55PM, Steve Spicklemire st...@spvi.com wrote: I got Xcode 4,6 from the App Store. I don't think it's the SDK since the python 2.7 version builds fine. It's just the 3.2 version that doesn't have the -I/Library/Frameworks/Python.Framework/Versions/3.2/include/python3.2m in the compile options line. When I run setup for 2.7 I see the right include. I'm just not sure where setup is building those options, and why they're not working on 10.7 and 10.8 and python3.2. Strange! Where did you get the python3.2 from? Building the 1.7.0 release works for me under 10.8 and Xcode 4.6 both with the system-provided /usr/bin/python2.7 and with fink-installed versions of python2.7 and python3.2, but in no case is it linking or including any 10.6 SDK: C compiler: gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes compile options: '-Inumpy/core/include -Ibuild/src.macosx-10.8-x86_64-3.2/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/sw/include/python3.2m -Ibuild/src.macosx-10.8-x86_64-3.2/numpy/core/src/multiarray -Ibuild/src.macosx-10.8-x86_64-3.2/numpy/core/src/umath -c' HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how do I specify maximum line length when using savetxt?
On 06.12.2012, at 12:40AM, Mark Bakker wrote: I guess I wasn't explicit enough. Say I have an array with 100 numbers and I want to write it to a file with 6 numbers on each line (and hence, only 4 on the last line). Can I use savetxt to do that? What other easy tool does numpy have to do that? I've just been looking into a similar case and I think there is no easy tool for this - i.e. nothing comparable to Fortran's '(6e10.3)' or the like, so if your array does not reshape to a Nx6 array, you'd probably have to write something customised yourself. I would not be terribly difficult to add such functionality to savetxt, but then, unless you want the output file to be more human-readable, there is not really a strong case for writing a shape (100,) array into 16 lines plus an incomplete one - it just would not play well with reading back in and then determining the right shape automatically… HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple Loadtxt question
On 29.11.2012, at 1:21AM, Robert Love wrote: I have a file with thousands of lines like this: Signal was returned in 204 microseconds Signal was returned in 184 microseconds Signal was returned in 199 microseconds Signal was returned in 4274 microseconds Signal was returned in 202 microseconds Signal was returned in 189 microseconds I try to read it like this: data = np.loadtxt('dummy.data', dtype={'names':('label','times','musec'), 'fmts':('|S23','i8','|S13')}) It fails, I think, because it wants a string format and field for each of the words 'Signal' 'was' 'returned' etc. Can I make it treat that whole string before the number as one string, one field? All I really care about is the numbers anyway. Then how about np.loadtxt('dummy.data', usecols=(4, )) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Synonym standards
On 27.07.2012, at 3:27PM, Benjamin Root wrote: I would prefer not to use: from xxx import *, because of the name pollution. The name convention that I copied above facilitates avoiding the pollution. In the same spirit, I've used: import pylab as plb But in that same spirit, using np and plt separately is preferred. Namespaces are one honking great idea -- let's do more of those! from http://www.python.org/dev/peps/pep-0020/ Absolutely correct. The namespace pollution is exactly why we encourage converts to move over from the pylab mode to separating out the numpy and pyplot namespaces. There are very subtle issues that arise when doing from pylab import * such as overriding the built-in any and all. The only real advantage of the pylab mode over separating out numpy and pyplot is conciseness, which many matlab users expect at first. It unfortunately also comes with the convenience of using the ipython --pylab mode - does anyone know how to turn the import * part of, or how to create a similar working environment with ipython that does keep namespaces clean? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Synonym standards
On 27 Jul 2012, at 17:58, Tony Yu wrote: On Fri, Jul 27, 2012 at 11:39 AM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 27.07.2012, at 3:27PM, Benjamin Root wrote: I would prefer not to use: from xxx import *, because of the name pollution. The name convention that I copied above facilitates avoiding the pollution. In the same spirit, I've used: import pylab as plb But in that same spirit, using np and plt separately is preferred. Namespaces are one honking great idea -- let's do more of those! from http://www.python.org/dev/peps/pep-0020/ Absolutely correct. The namespace pollution is exactly why we encourage converts to move over from the pylab mode to separating out the numpy and pyplot namespaces. There are very subtle issues that arise when doing from pylab import * such as overriding the built-in any and all. The only real advantage of the pylab mode over separating out numpy and pyplot is conciseness, which many matlab users expect at first. It unfortunately also comes with the convenience of using the ipython --pylab mode - does anyone know how to turn the import * part of, or how to create a similar working environment with ipython that does keep namespaces clean? Cheers, Derek There's a config flag that you can add to your ipython profile: c.TerminalIPythonApp.pylab_import_all = False For example, my profile is in ~/.ipython/profile_default/ipython_config.py thanks, that was exactly what I was looking for - together with c.TerminalIPythonApp.exec_lines = ['import sys', 'import numpy as np', 'import matplotlib as mpl', 'import matplotlib.pyplot as plt'] etc. to have the shortcuts. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Synonym standards
On 27.07.2012, at 8:30PM, Fernando Perez fperez@gmail.com wrote: On Fri, Jul 27, 2012 at 9:43 AM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: thanks, that was exactly what I was looking for - together with c.TerminalIPythonApp.exec_lines = ['import sys', 'import numpy as np', 'import matplotlib as mpl', 'import matplotlib.pyplot as plt'] Note that if you do this only and don't use %pylab interactively or the --pylab flag, then you will *not* get the proper non-blocking control of the matplotlib event loop integrated with the terminal or qtconsole. In summary, following Tony's suggestion is enough to give you: - event loop integration when you do --pylab at the prompt or %pylab in ipython. - the np, mpl and plt shortcuts - no 'import *' at all. So that should be sufficient, but you should still use --pylab or %pylab to indicate to IPython that you want the mpl event loops to work in conjunction with the shell. Yes, I was aware of that, without the pylab option at least with the macosx backend windows either would not draw and refresh properly, or block the shell after a draw() or show(); that's why I was asking how to avoid the 'import *' with it. I have not used the %pylab builtin before, though. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] silly isscalar question
On 29 May 2012, at 15:00, Mark Bakker wrote: Why does isscalar('hello') return True? I thought it would check for a number? No, it checks for something that is of 'scalar type', which probably can be translated as 'not equivalent to an array'. Since strings can form numpy arrays, I guess the logic behind this is that the string is the atomic block of an array of dtype 'S' - for comparison, np.isscalar(['hello']) = False. I note the fine distinction between np.isscalar( ('hello') ) and np.isscalar( ('hello'), )... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] silly isscalar question
On 29 May 2012, at 15:42, Nathaniel Smith wrote: I note the fine distinction between np.isscalar( ('hello') ) and np.isscalar( ('hello'), )... NB you mean np.isscalar( ('hello',) ), which creates a single-element tuple. A trailing comma attached to a value in Python normally creates a tuple, but in a function argument list it is treated as separating arguments instead, and a trailing empty argument is ignored. The parentheses need to be around the comma to hide it from from the argument list parsing rule so that the tuple rule can see it. (Probably you know this, but for anyone reading the archives later...) Correct, sorry for the typo! I was actually puzzled by the habit of what seemed to me automatic unpacking of the simple case ('hello') as compared to ('hello', ); I only now looked up that by the Python syntax indeed the comma makes the tuple, not the parentheses, the latter only becoming necessary to protect the comma as you describe above. Just stumbled on this as in several cases, numpy's rules for creating arrays from tuples are slightly different from those for creating arrays from lists. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1
On 06.05.2012, at 8:16AM, Paul Anton Letnes wrote: All tests for 1.6.2rc1 pass on Mac OS X 10.7.3 python 2.7.2 gcc 4.2 (Apple) Passing as well on 10.6 x86_64 and on 10.5.8 ppc with python 2.5.6/2.6.6/2.7.2 Apple gcc 4.0.1, but I am getting one failure on Lion (same with Python 2.5.6+2.6.7): Python version 2.7.3 (default, May 6 2012, 15:05:35) [GCC 4.2.1 Compatible Apple Clang 3.0 (tags/Apple/clang-211.12)] nose version 1.1.2 == FAIL: Test basic arithmetic function errors -- Traceback (most recent call last): File /sw/lib/python2.7/site-packages/numpy/testing/decorators.py, line 215, in knownfailer return f(*args, **kwargs) File /sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py, line 323, in test_floating_exceptions lambda a,b:a*b, ft_tiny, ft_tiny) File /sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py, line 271, in assert_raises_fpe Type %s did not raise fpe error '%s'. % (ftype, fpeerr)) File /sw/lib/python2.7/site-packages/numpy/testing/utils.py, line 34, in assert_ raise AssertionError(msg) AssertionError: Type type 'numpy.complex64' did not raise fpe error ''. -- Ran 3551 tests in 130.778s FAILED (KNOWNFAIL=3, SKIP=4, failures=1) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to check type of array?
On 29 Mar 2012, at 13:54, Chao YUE wrote: how can I check type of array in if condition expression? In [75]: type(a) Out[75]: type 'numpy.ndarray' In [76]: a.dtype Out[76]: dtype('int32') a.dtype=='int32'? this and a.dtype=='i4' a.dtype==np.int32 all work. For a more general check (e.g. if it is any type of integer), you can do np.issubclass_(a.dtype.type, np.integer) See also help(np.subdtype) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to check type of array?
On 29 Mar 2012, at 14:49, Robert Kern wrote: all work. For a more general check (e.g. if it is any type of integer), you can do np.issubclass_(a.dtype.type, np.integer) I don't recommend using that. Use np.issubdtype(a.dtype, np.integer) instead. Sorry, you're right, this works the same way - I had the impression from the documentation that tests like np.issubdtype(np.int16, np.integer) would not work, but they do. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to Extract the Number of Rows and Columns in a Matrix
On 27.03.2012, at 1:26AM, Olivier Delalleau wrote: len(M) will give you the number of rows of M. For columns I just use M.shape[1] myself, I don't know if there exists a shortcut. You can use tuple unpacking, if that helps keeping your code conciser… nrow, ncol = M.shape Cheers, Derek Le 26 mars 2012 19:03, Stephanie Cooke cooke.stepha...@gmail.com a écrit : Hello, I would like to extract the number of rows and columns of a matrix individually. The shape command outputs the rows and columns together, but are there commands that will separately give the rows and separately give the columns? Thanks ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] AttributeError with shape command
On 27.03.2012, at 2:07AM, Stephanie Cooke wrote: I am new to numpy. When I try to use the command array.shape, I get the following error: AttributeError: 'list' object has no attribute 'shape' Is anyone familiar with this type of error? It means 'array' actually is not one, more precisely, not an object of type np.ndarray. How did you create your array? If it originates just from a list of numbers, you can create an array from it by 'np.array(list)' (assuming previous 'import numpy as np'). It's also possible that a function has returned a list of arrays where you might have expected a single array - so it really depends on the circumstances. HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Trying to read 500M txt file using numpy.genfromtxt within ipython shell
On 20 Mar 2012, at 14:40, Chao YUE wrote: I would be in agree. thanks! I use gawk to separate the file into many files by year, then it would be easier to handle. anyway, it's not a good practice to produce such huge line txt files Indeed it's not, but it's also not good practice to load the entire content of text files as python lists into memory, as unfortunately all the numpy readers are still doing. But this has been discussed on this list and improvements are under way. For your problem at hand the textreader Warren Weckesser recently made known - can't find the post right now, but you can find it at https://github.com/WarrenWeckesser/textreader might be helpful. It is still under construction, but for a plain csv file such as yours it should be working already. And since the text parsing is implemented in C, it should also give you a huge speedup for your 1/2 GB! For additional profiling, similar to what David suggested, it would certainly be a good idea to read in smaller chunks of the file and write it directly to the netCDF file. Note that you can already read single lines at a time with the likes of from StringIO import StringIO f = open('file.txt'. 'r') np.genfromtxt(StringIO(f.next()), delimiter=',') but I don't think it would work this way with textreader, and iterating such a small loop over lines in Python would beat the point of using a fast reader. As your actual data would be 1GB in numpy, memory usage with textreader should also not be critical yet. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] remove redundant dimension in a ndarray?
Dear Chao, Do we have a function in numpy that can automatically shrink a ndarray with redundant dimension? like I have a ndarray with shape of (13,1,1,160,1), now I have written a small function to change the array to dimension of (13,160) [reduce the extra dimension with length as 1]. but I just would like to know maybe there is already something which can do this there ? np.squeeze Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Saving and loading a structured array from a TEXT file
On 23 Jan 2012, at 21:15, Emmanuel Mayssat wrote: Is there a way to save a structured array in a text file? My problem is not so much in the saving procedure, but rather in the 'reloading' procedure. See below In [3]: import numpy as np In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', 'i8'), ('bar', 'f8')]) In [5]: r.tofile('toto.txt',sep='\n') bash-4.2$ cat toto.txt ('1', 1, 1.0) ('1', 1, 1.0) ('1', 1, 1.0) In [7]: r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype) --- ValueErrorTraceback (most recent call last) /home/cls1fs/clseng/10/ipython-input-7-b07ba265ede7 in module() 1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype) ValueError: Unable to read character files of that array type I think most of the np.fromfile functionality works for binary input; for reading text input np.loadtxt and np.genfromtxt are the (currently) recommended functions. It is bit tricky to read the format generated by tofile() in the above example, but the following should work: cnv = {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')} r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype) Generally loadtxt works more smoothly together with savetxt, but the latter unfortunately does not offer an easy way to save structured arrays (note to self and others currently working on npyio: definitely room for improvement!). HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Saving and loading a structured array from a TEXT file
On 23 Jan 2012, at 22:07, Derek Homeier wrote: In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', 'i8'), ('bar', 'f8')]) In [5]: r.tofile('toto.txt',sep='\n') bash-4.2$ cat toto.txt ('1', 1, 1.0) ('1', 1, 1.0) ('1', 1, 1.0) cnv = {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')} r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype) Generally loadtxt works more smoothly together with savetxt, but the latter unfortunately does not offer an easy way to save structured arrays (note to self and others currently working on npyio: definitely room for improvement!). For the record, in that example np.savetxt('toto.txt', r, fmt='%s,%d,%f') would work as well, saving you the custom converter for loadtxt - it could just become tedious to work out the format for more complex structures, so an option to construct this automatically from r.dtype could certainly be a nice enhancement. Just wondering, is there something like the inverse operator to np.format_parser, i.e. mapping each dtype to a default print format specifier? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 'Advanced' save and restore operation
On 24 Jan 2012, at 01:45, Olivier Delalleau wrote: Note sure if there's a better way, but you can do it with some custom load and save functions: with open('f.txt', 'w') as f: ... f.write(str(x.dtype) + '\n') ... numpy.savetxt(f, x) with open('f.txt') as f: ... dtype = f.readline().strip() ... y = numpy.loadtxt(f).astype(dtype) I'm not sure how that'd work with structured arrays though. For the dict of parameters you'd have to write your own load/save piece of code too if you need a clean text file. -=- Olivier 2012/1/23 Emmanuel Mayssat emays...@gmail.com After having saved data, I need to know/remember the data dtype to restore it correctly. Is there a way to save the dtype with the data? (I guess the header parameter of savedata could help, but they are only available in v2.0+ ) I would like to save several related structured array and a dictionary of parameters into a TEXT file. Is there an easy way to do that? (maybe xml file, or maybe archive zip file of other files, or . ) Any recommendation is helpful. asciitable might be of some help, but to implement all of your required functionality, you'd probably still have to implement your own Reader class: http://cxc.cfa.harvard.edu/contrib/asciitable/ Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] find location of maximum values
On 04.01.2012, at 5:10AM, questions anon wrote: Thanks for your responses but I am still having difficuties with this problem. Using argmax gives me one very large value and I am not sure what it is. There shouldn't be any issues with the shape. The latitude and longitude are the same shape always (covering a state) and the temperature (TSFC) data are hourly for a whole month. There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == LON.shape One needs more information on the structure of these data to say anything definite, but if e.g. your TSFC data have a time and a location dimension, argmax will per default return the index for the flattened array (see the argmax documentation for details, and how to use the axis keyword to get a different output). This might be the very large value you mention, and if your location data have fewer dimensions, the index will easily be out of range. As Ben wrote, you'd need extra work to find the maximum location, depending on what maximum you are actually looking for. As a speculative example, let's assume you have the temperature data in an array(ntime, nloc) and the position data in array(nloc). Then TSFC.argmax(axis=1) would give you the index for the hottest place for each hour of the month (i.e. actually an array of ntime indices, and pointer to so many different locations). To locate the maximum temperature for the entire month, your best way would probably be to first extract the array of (monthly) maximum temperatures in each location as tmax = TSFC.max(axis=0) which would have (in this example) the shape (nloc,), so you could directly use it to index LAT[tmax.argmax()] etc. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] structured array with submember assign
On 26.12.2011, at 7:37PM, Fabian Dill wrote: I have a problem with a structured numpy array. I create is like this: tiles = numpy.zeros((header[width], header[height],3), dtype = numpy.uint8) and later on, assignments such as this: tiles[x, y,0] = 3 Now uint8 is not sufficient anymore, but only for the first of the 3 values. uint16 for all of them would use too much ram (increase of 1-3 GB) I have tried using structured arrays, but the dtype is essentially always a tuple. tiles = numpy.zeros((header[width], header[height], 1), dtype = u2,u1,u1) tiles[x, y,0] = 0 TypeError: expected an object with a buffer interface If you create a structured array, you probably don't want the third dimension, as the structure already spans three fields, and to assign to it you either need to address the fields explicitly (with the default field names 'f0', 'f1', 'f2'), or use an array with corresponding dtype: dt = u2,u1,u1 tiles = numpy.zeros((2,3), dtype=dt) tiles array([[(0, 0, 0), (0, 0, 0), (0, 0, 0)], [(0, 0, 0), (0, 0, 0), (0, 0, 0)]], dtype=[('f0', 'u2'), ('f1', '|u1'), ('f2', '|u1')]) tiles['f0'][0] = 1 tiles[0,1] = np.array((3,4,5), dtype=dt) tiles array([[(1, 0, 0), (3, 4, 5), (1, 0, 0)], [(0, 0, 0), (0, 0, 0), (0, 0, 0)]], dtype=[('f0', 'u2'), ('f1', '|u1'), ('f2', '|u1')]) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy1.6.1 install fortran compiler error
Hi Jack, In order to install scipy, I am trying to install numpy 1.6.1. on GNU/linux redhat 2.6.18. But, I got error about fortran compiler. I have gfortran. I do not have f77/f90/g77/g90. that's good! I run : python setup.py build --fcompiler=gfortran It woks well and tells me that customize Gnu95FCompiler Found executable /usr/bin/gfortran But, i run: building library npymath sources customize GnuFCompiler Could not locate executable g77 Could not locate executable f77 customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize LaheyFCompiler Could not locate executable lf95 customize PGroupFCompiler Could not locate executable pgf90 Could not locate executable pgf77 customize AbsoftFCompiler Could not locate executable f90 customize NAGFCompiler Found executable /usr/bin/f95 customize Gnu95FCompiler customize Gnu95FCompiler using config C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC Do I have to install f77/f90/g77/g90 ? You did not send any actual error message here, so it's difficult to tell where exactly your install failed. But gfortran is preferred over f77 etc. and should in fact be automatically selected (without the '--fcompiler=gfortran'), it is apparently also found in the right place. Could you send us the last lines of output with the error itself, or possibly everything following a line starting with Traceback:.. ; and also the output of `gfortran -v`? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy1.6.1 install fortran compiler error
On 20.12.2011, at 9:01PM, Jack Bryan wrote: customize Gnu95FCompiler using config C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/remote/dcnl/Ding/backup_20100716/python272/include/python2.7 -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest success! removing: _configtest.c _configtest.o _configtest C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/mypath/python272/include/python2.7 -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest _configtest failure. The blas failures further up are non-fatal, but I am not sure about the _configtest.c, or why it once succeeds, then fails again - anyway the installation appears to have finished. at the end: running install_egg_info Removing /mypath/numpy/lib/python2.7/site-packages/numpy-1.6.1-py2.7.egg-info Writing /mypath/numpy/lib/python2.7/site-packages/numpy-1.6.1-py2.7.egg-info running install_clib Then I got: python Python 2.7.2 (default, Dec 20 2011, 12:32:10) [GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2 Type help, copyright, credits or license for more information. import numpy Traceback (most recent call last): File stdin, line 1, in module ImportError: No module named numpy I have updated PATH for bin and lib of numpy. You will need '/mypath/numpy/lib/python2.7/site-packages' in your PYTHONPATH - have you done that, and does it show up with import sys sys.path() in the Python shell? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy 1.7.0 release?
On 06.12.2011, at 11:13PM, Wes McKinney wrote: This isn't the place for this discussion but we should start talking about building a *high performance* flat file loading solution with good column type inference and sensible defaults, etc. It's clear that loadtable is aiming for highest compatibility-- for example I can read a 2800x30 file in 50 ms with the read_table / read_csv functions I wrote myself recent in Cython (compared with loadtable taking 1s as quoted in the pull request), but I don't handle European decimal formats and lots of other sources of unruliness. I personally don't believe in sacrificing an order of magnitude of performance in the 90% case for the 10% case-- so maybe it makes sense to have two functions around: a superfast custom CSV reader for well-behaved data, and a slower, but highly flexible, function like loadtable to fall back on. I think R has two functions read.csv and read.csv2, where read.csv2 is capable of dealing with things like European decimal format. Generally I agree, there's a good case for that, but I have to point out that the 1s time quoted there was with all auto-detection extravaganza turned on. Actually, if I remember the discussions right, in default, single-pass reading mode, it comes even close to genfromtxt and loadtxt (on my machine 150-200 ms for 2800 rows x 30 columns real*8). Originally loadtxt was intended to be that no-frills, fast reader, but in practice it is rarely faster than genfromtxt as the conversion from input strings to Python objects seems to be the bottleneck most of the time. Speeding that up using Cython certainly would be a big gain (and then there also is the request to make loadtxt memory-efficient, which I have failed to follow up on for weeks and weeks…) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loop through values in a array and find maximum as looping
On 07.12.2011, at 5:07AM, Olivier Delalleau wrote: I *think* it may work better if you replace the last 3 lines in your loop by: a=all_TSFC[0] if len(all_TSFC) 1: N.maximum(a, TSFC, out=a) Not 100% sure that would work though, as I'm not entirely confident I understand your code. -=- Olivier 2011/12/6 questions anon questions.a...@gmail.com Something fancier I think, I am able to compare the result with my previous method so I can easily see I am doing something wrong. see code below: all_TSFC=[] for (path, dirs, files) in os.walk(MainFolder): for dir in dirs: print dir path=path+'/' for ncfile in files: if ncfile[-3:]=='.nc': print dealing with ncfiles:, ncfile ncfile=os.path.join(path,ncfile) ncfile=Dataset(ncfile, 'r+', 'NETCDF4') TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) ncfile.close() all_TSFC.append(TSFC) a=TSFC[0] for b in TSFC[1:]: N.maximum(a,b,out=a) I also understood TSFC is already the array you want to work on, so above you'd just take a slice and overwrite the result in the next file iteration anyway. Iterating over the list all_TSFC should be correct, but I understood you don't want to load the entire input into memory in you working code. Then you can simply skip the list, just need to take care of initial conditions - something like the following should do: path=path+'/' a = None for ncfile in files: if ncfile[-3:]=='.nc': print dealing with ncfiles:, ncfile ncfile=os.path.join(path,ncfile) ncfile=Dataset(ncfile, 'r+', 'NETCDF4') TSFC=ncfile.variables['T_SFC'][:] fillvalue=ncfile.variables['T_SFC']._FillValue TSFC=MA.masked_values(TSFC, fillvalue) ncfile.close() if not is instance(a,N.ndarray): a=TSFC else: N.maximum(a, TSFC, out=a) HTH, Derek big_array=N.ma.concatenate(all_TSFC) Max=big_array.max(axis=0) print max is, Max,a is, a ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loop through values in a array and find maximum as looping
On 07.12.2011, at 5:54AM, questions anon wrote: sorry the 'all_TSFC' is for my other check of maximum using concatenate and N.max, I know that works so I am comparing it to this method. The only reason I need another method is for memory error issues. I like the code I have written so far as it makes sense to me. I can't get the extra examples I have been given to work and that is most likely because I don't understand them, these are the errors I get : Traceback (most recent call last): File d:\plot_summarystats\test_plot_remove_memoryerror_max.py, line 46, in module N.maximum(a,TSFC,out=a) ValueError: non-broadcastable output operand with shape (106,193) doesn't match the broadcast shape (721,106,193) and OK, then it seems we did not indeed grasp the entire scope of the problem - since you have initialised a from the previous array TSFC (not from TSFC[0]?!), this can only mean the arrays read in come in different shapes? I don't quite understand how the previous version did not raise an error then; but if you only want the (106,193)-subarray you have indeed to keep the loop for b in TSFC[:]: N.maximum(a,b,out=a) But you would have to find some way to distinguish between ndim=2 and ndim=3 input, if really both can occur... Traceback (most recent call last): File d:\plot_summarystats\test_plot_remove_memoryerror_max.py, line 45, in module if not instance(a, N.ndarray): NameError: name 'instance' is not defined Sorry, typing error (or devious auto-correct?) - this should be 'isinstance()' Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
On 03.12.2011, at 6:22PM, Robin Kraft wrote: That does repeat the elements, but doesn't get them into the desired order. In [4]: print a [[1 2] [3 4]] In [7]: np.tile(a, 4) Out[7]: array([[1, 2, 1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4, 3, 4]]) In [8]: np.tile(a, 4).reshape(4,4) Out[8]: array([[1, 2, 1, 2], [1, 2, 1, 2], [3, 4, 3, 4], [3, 4, 3, 4]]) It's close, but I want to repeat the elements along the two axes, effectively stretching it by the lower right corner: array([[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]) It would take some more reshaping/axis rolling to get there, but it seems doable. Anyone know what combination of manipulations would work with the result of np.tile? Rolling was the keyword: np.rollaxis(np.tile(a, 4).reshape(2,2,-1), 2, 1).reshape(4,4)) [[1 1 2 2] [1 1 2 2] [3 3 4 4] [3 3 4 4]] I leave the generalisation and timing up to you, but it seems for a = np.arange(M**2).reshape(M,-1) np.rollaxis(np.tile(a, N**2).reshape(M,N,-1), 2, 1).reshape(M*N,-1) should do the trick. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
On 03.12.2011, at 6:47PM, Olivier Delalleau wrote: Ah sorry, I hadn't read carefully enough what you were trying to achieve. I think the double repeat solution looks like your best option then. Considering that it is a lot shorter than fixing the tile() result, you are probably right (I've only now looked closer at the repeat() solution ;-). I'd still be interested in the performance - since I think none of the reshape or rollaxis operations actually move any data in memory (for numpy 1.6), it might still be faster. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.array() of mixed integers and strings can truncate data
On 1 Dec 2011, at 17:39, Charles R Harris wrote: On Thu, Dec 1, 2011 at 6:52 AM, Thouis (Ray) Jones tho...@gmail.com wrote: Is this expected behavior? np.array([-345,4,2,'ABC']) array(['-34', '4', '2', 'ABC'], dtype='|S3') Given that strings should be the result, this looks like a bug. It's a bit of a corner case that probably slipped through during the recent work on casting. There needs to be tests for these sorts of things, so if you find more oddities post them so we can add them. As it is not dependent on the string appearing before or after the numbers, numerical values appear to always be processed first before any string transformation, even if you explicitly specify the string format - consider the following (1.6.1): np.array((2, 12,0.1+2j)) array([ 2.0+0.j, 12.0+0.j, 0.1+2.j]) np.array((2, 12,0.001+2j)) array([ 2.e+00+0.j, 1.2000e+01+0.j, 1.e-03+2.j]) np.array((2, 12,0.001+2j), dtype='|S8') array(['2', '12', '(0.001+2'], dtype='|S8') - notice the last value is only truncated because it had first been converted into a standard complex representation, so maybe the problem is already in the way Python treats the input. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.array() of mixed integers and strings can truncate data
On 1 Dec 2011, at 21:35, Chris Barker wrote: On 12/1/2011 9:15 AM, Derek Homeier wrote: np.array((2, 12,0.001+2j), dtype='|S8') array(['2', '12', '(0.001+2'], dtype='|S8') - notice the last value is only truncated because it had first been converted into a standard complex representation, so maybe the problem is already in the way Python treats the input. no -- it's truncated because you've specified a 8 char long string, and the string representation of complex is longer than that. I assume that numpy is using the objects __str__ or __repr__: In [13]: str(0.001+2j) Out[13]: '(0.001+2j)' In [14]: repr(0.001+2j) Out[14]: '(0.001+2j)' That's what I meant with the Python-side of the issue, but you're right, there is no numerical conversion involved. I think the only bug we've identified here is that numpy is selecting the string size based on the longest string input, rather than checking to see how long the string representation of the numeric input is as well. if there is a long-enough string in there, it works fine: In [15]: np.array([-345,4,2,'ABC', 'abcde']) Out[15]: array(['-345', '4', '2', 'ABC', 'abcde'], dtype='|S5') An open question is what it should do if you specify the length of the string dtype, but one of the values can't be fit into that size. At this point, it truncates, but should it raise an error? I would probably raise a warning rather than an error - I think if the user explicitly specifies a string length, they should be aware that the data might be truncated (and might even want this behaviour). Another issue could be that the string representation can look quite different from what has been typed in, like In [95]: np.array(('abcdefg', 12, 0.1+2j), dtype='|S12') Out[95]: array(['abcdefg', '12', '(1e-05+2j)'], dtype='|S12') but then I think one has to accept that _ 0.1+2j _ is not a string and thus cannot be guaranteed to be represented in that exact way - it can be either understood as a numerical object or not at all (i.e. one should just type it in as a string - with quotes - if one wants string-behaviour). Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On 25 Oct 2011, at 21:14, Pauli Virtanen wrote: 25.10.2011 20:29, Matthew Brett kirjoitti: [clip] In [7]: (res-1) / 2**32 Out[7]: 8589934591.98 In [8]: np.float((res-1) / 2**32) Out[8]: 4294967296.0 Looks like a bug in the C library installed on the machine, then. It's either in wontfix territory for us, or in the cast to doubles before formatting one. In the latter case, one would have to maintain a list of broken C libraries (ugh). as it appears to be a Tiger-only problem, probably the former? On 25 Oct 2011, at 21:13, Matthew Brett wrote: [mb312@jerry ~]$ uname -a Darwin jerry.bic.berkeley.edu 8.11.0 Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power Macintosh powerpc Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Printing individual array elements with at least 15 significant digits
On 15.10.2011, at 9:21PM, Hugo Gagnon wrote: I need to print individual elements of a float64 array to a text file. However in the file I only get 12 significant digits, the same as with: a = np.zeros(3) a.fill(1./3) print a[0] 0. len(str(a[0])) - 2 12 whereas len(repr(a[0])) - 2 17 which makes more sense since I am expecting at least 15 significant digits… So how can I print a np.float64 with at least 15 significant digits (without repr!)? You mean like '%.15e' % (1./3) '3.333e-01' ? If you are using e.g. savetxt to print to the file, you can specify the format the same way (actually the default for savetxt is already %.18e, which should satisfy your demands). HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Float128 integer comparison
On 15.10.2011, at 9:42PM, Aronne Merrelli wrote: On Sat, Oct 15, 2011 at 1:12 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Continuing the exploration of float128 - can anyone explain this behavior? np.float64(9223372036854775808.0) == 9223372036854775808L True np.float128(9223372036854775808.0) == 9223372036854775808L False int(np.float128(9223372036854775808.0)) == 9223372036854775808L True np.round(np.float128(9223372036854775808.0)) == np.float128(9223372036854775808.0) True I know little about numpy internals, but while fiddling with this, I noticed a possible clue: np.float128(9223372036854775808.0) == 9223372036854775808L False np.float128(4611686018427387904.0) == 4611686018427387904L True np.float128(9223372036854775808.0) - 9223372036854775808L Traceback (most recent call last): File stdin, line 1, in module TypeError: unsupported operand type(s) for -: 'numpy.float128' and 'long' np.float128(4611686018427387904.0) - 4611686018427387904L 0.0 My speculation - 9223372036854775808L is the first integer that is too big to fit into a signed 64 bit integer. Python is OK with this but that means it must be containing that value in some more complicated object. Since you don't get the type error between float64() and long: np.float64(9223372036854775808.0) - 9223372036854775808L 0.0 Maybe there are some unimplemented pieces in numpy for dealing with operations between float128 and python arbitrary longs? I could see the == test just producing false in that case, because it defaults back to some object equality test which isn't actually looking at the numbers. That seems to make sense, since even upcasting from a np.float64 still lets the test fail: np.float128(np.float64(9223372036854775808.0)) == 9223372036854775808L False while np.float128(9223372036854775808.0) == np.uint64(9223372036854775808L) True and np.float128(9223372036854775809) == np.uint64(9223372036854775809L) False np.float128(np.uint(9223372036854775809L) == np.uint64(9223372036854775809L) True Showing again that the normal casting to, or reading in of, a np.float128 internally inevitably calls the python float(), as already suggested in one of the parallel threads (I think this also came up with some of the tests for precision) - leading to different results than when you can convert from a np.int64 - this makes the outcome look even weirder: np.float128(9223372036854775807.0) - np.float128(np.int64(9223372036854775807)) 1.0 np.float128(9223372036854775296.0) - np.float128(np.int64(9223372036854775807)) 1.0 np.float128(9223372036854775295.0) - np.float128(np.int64(9223372036854775807)) -1023.0 np.float128(np.int64(9223372036854775296)) - np.float128(np.int64(9223372036854775807)) -511.0 simply due to the nearest np.float64 always being equal to MAX_INT64 in the two first cases above (or anything in between)... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt
Hi Nils, On 11 Oct 2011, at 16:34, Nils Wagner wrote: How do I use genfromtxt to read a file with the following lines 11 2.2592365264892578D+01 22 2.2592365264892578D+01 13 2.669845581055D+00 33 2.2592365264892578D+01 24 2.669845581055D+00 44 2.2592365264892578D+01 35 2.669845581055D+00 55 2.2592365264892578D+01 46 2.669845581055D+00 66 2.2592365264892578D+01 17 2.9814243316650391D+00 77 1.7259031295776367D+01 28 2.9814243316650391D+00 88 1.7259031295776367D+01 ... names =(i,j,v) A = np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','d')],names=names) V = A[:]['v'] V array([ NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]) yields NaN, while convertfunc = lambda x: x.replace('D','E') names =(i,j,v) A = np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','|S24')],names=names,converters={v:convertfunc}) V = A[:]['v'].astype(float) V array([ 22.59236526, 22.59236526, 2.6698, 22.59236526, 2.6698, 22.59236526, 2.6698, 22.59236526, 2.6698, 22.59236526, 2.98142433, 17.2590313 , 2.98142433, 17.2590313 , 2.98142433, 2.98142433, 2.6698, 22.59236526, 2.98142433, 2.98142433, 2.6698, 22.59236526, 2.98142433, 2.98142433, 2.6698, 22.59236526, 2.98142433, 2.98142433, 2.6698, 22.59236526, 2.98142433, 2.6698, 17.2590313 , 2.98142433, 2.6698, 17.2590313 ]) works fine. took me a moment to figure out what the actual problem remaining was, but expect you'd prefer it to load directly into a float record? The problem is simply that the converter _replaces_ the default converter function (which would be float(x) in this case), rather than operating on top of it. Try instead convertfunc = lambda x: float(x.replace('D','E')) and you should be ready to use ('v', 'd') as dtype (BTW, specifying 'names' is redundant in the above example). This behaviour is only hinted at in the docstring example, so maybe the documentation should be clearer here. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Nice float - integer conversion?
On 11 Oct 2011, at 20:06, Matthew Brett wrote: Have I missed a fast way of doing nice float to integer conversion? By nice I mean, rounding to the nearest integer, converting NaN to 0, inf, -inf to the max and min of the integer range? The astype method and cast functions don't do what I need here: In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) Out[40]: array([1, 0, 0, 0], dtype=int16) In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) Out[41]: array([1, 0, 0, 0], dtype=int16) Have I missed something obvious? np.[a]round comes closer to what you wish (is there consensus that NaN should map to 0?), but not quite there, and it's not really consistent either! In [42]: c = np.zeros(4, np.int16) In [43]: d = np.zeros(4, np.int32) In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c) Out[44]: array([2, 0, 0, 0], dtype=int16) In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d) Out[45]: array([ 2, -2147483648, -2147483648, -2147483648], dtype=int32) Perhaps a starting point to harmonise this behaviour and get it closer to your expectations (it still would not be really nice having to define the output array first, I guess)... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Nice float - integer conversion?
On 11.10.2011, at 9:18PM, josef.p...@gmail.com wrote: In [42]: c = np.zeros(4, np.int16) In [43]: d = np.zeros(4, np.int32) In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c) Out[44]: array([2, 0, 0, 0], dtype=int16) In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d) Out[45]: array([ 2, -2147483648, -2147483648, -2147483648], dtype=int32) Perhaps a starting point to harmonise this behaviour and get it closer to your expectations (it still would not be really nice having to define the output array first, I guess)... what numpy is this? This was 1.6.1 I did suppress a RuntimeWarning that was raised on the first call, though: In [33]: np.around([1.67,np.nan,np.inf,-np.inf], decimals=1, out=d) /sw/lib/python2.7/site-packages/numpy/core/fromnumeric.py:37: RuntimeWarning: invalid value encountered in multiply result = getattr(asarray(obj),method)(*args, **kwds) np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) array([ 1, -32768, -32768, -32768], dtype=int16) np.__version__ '1.5.1' a = np.ones(4, np.int16) a[:]=np.array([1.6, np.nan, np.inf, -np.inf]) a array([ 1, -32768, -32768, -32768], dtype=int16) I thought we get ValueError to avoid nan to zero bugs a[2] = np.nan Traceback (most recent call last): File pyshell#22, line 1, in module a[2] = np.nan ValueError: cannot convert float NaN to integer On master, an integer out raises a TypeError for any float input - not sure I'd consider that an improvement… np.__version__ '2.0.0.dev-8f689df' np.around([1.6,-23.42, -13.98, 0.14], out=c) Traceback (most recent call last): File stdin, line 1, in module File /Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py, line 2277, in around return _wrapit(a, 'round', decimals, out) File /Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py, line 37, in _wrapit result = getattr(asarray(obj),method)(*args, **kwds) TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to provided output parameter (typecode 'h') according to the casting rule “same_kind“ I thought the NaN might have been dealt with first, before casting to int, but that doesn't seem to be the case (on master, again): np.around([1.6,np.nan,np.inf,-np.inf]) array([ 2., nan, inf, -inf]) np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int16) array([2, 0, 0, 0], dtype=int16) np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int32) array([ 2, -2147483648, -2147483648, -2147483648], dtype=int32) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] what to use in buildbot config for numpy testing
On 07.09.2011, at 10:52PM, Chris Kees wrote: Is there a recommended way to run the numpy test suite as a buildbot test? Just run ad python -c import numpy; numpy.test as ShellCommand object? It would be numpy.test() [or numpy.test('full')]; then it depends on what you need as the return value of your test. I am using for package verification python -c 'import numpy, sys; ret=numpy.test(full); sys.exit(2*len(ret.errors+ret.failures))' so python will return with a value != 0 in case of an unsuccessful test (which it otherwise would not do). But this is just within a simple shell script. Cheers, Derek -- Derek Homeier Centre de Recherche Astrophysique de Lyon ENS Lyon 46, Allée d'Italie 69364 Lyon Cedex 07, France +33 1133 47272-8894 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] load from text files Pull Request Review
On 02.09.2011, at 11:45PM, Christopher Jordan-Squire wrote: and unfortunately it's for 1D-arrays only). That's not bad for this use -- make a row a struct dtype, and you've got a 1-d array anyway -- you can optionally convert to a 2-d array after the fact. I don't know why I didn't think of using fromiter() when I build accumulator. Though what I did is a bit more flexible -- you can add stuff later on, too, you don't need to do it allat once. I'm unsure how to use fromiter for missing data. It sounds like a potential solution when no data is missing, though. Strange I haven't thought about it before either; I guess for record arrays it comes more natural to view them as a collection of 1D arrays. However, you'd need to construct a list or something of ncolumn iterators from the input - should not be too hard; but then how do you feed the ncolumn fromiter() instances synchronously from that?? As far as I can see there is no way to make them read one item at a time, row by row. Then there are additional complications with multi-D dtypes, and in your case, especially datetime instances, but the problem that all columns have to be read in in parallel really seems to be the showstopper here. Of course for flat 2D arrays of data (all the same dtype) this would work with simply reshaping the array - that's probably even the most common use case for loadtxt, but that method lacks way too much generality for my taste. Back to accumulator, I suppose. Cheers, Derek -- Derek Homeier Centre de Recherche Astrophysique de Lyon ENS Lyon 46, Allée d'Italie 69364 Lyon Cedex 07, France +33 1133 47272-8894 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Efficient way to load a 1Gb file?
On 02.09.2011, at 1:47AM, Russell E. Owen wrote: I've made a pull request https://github.com/numpy/numpy/pull/144 implementing that option as a switch 'prescan'; could you review it in particular regarding the following: Is the option reasonably named and documented? In the case the allocated array does not match the input data (which really should never happen), right now just a warning is issued, filling any excess buffer with zeros or discarding remaining input data - should this rather raise an IndexError? No prediction if/when I might be able to provide this for genfromtxt, sorry! Cheers, Derek This looks like a great improvement to me! I think the name is well chosen and the help is very clear. Thanks for your feedback, just a few quick comments: A few comments: - Might you rename the variable l? It is easily confused with the digit 1. - I don't understand the l n_valid test, so this may be off base, but I'm surprised that you first massage the data and then raise an exception. Is the massaged data any use after the exception is raised? Naively I would expect you to issue a warning instead of raising an exception if you are going to handle the error by massaging the data. The exception is currently caught right after the loop, which might seem a bit illogical, but the idea was to handle both cases in one place (if l n_valid, trying to assign to X[l] will also raise an IndexError, meaning there are data left in the input that could not be stored) - so the present version indeed just issues a warning for both cases, but that could easily be changed... (It is a pity that your patch duplicates so much parsing code, but I don't see a better way to do it. Putting conditionals in the parsing loop to decide how to handle each line based on prescan would presumably slow things down too much.) That was my idea behind it - otherwise I would also have considered moving it out into its own functions, but as long as the entire code more or less fits into one editor window, this did not appear an obstacle to me. The main update on the issue is however, that all this is currently on hold because some concerns have been raised about not using dynamic resizing instead (the extra reading pass would break streamed input), and we have been discussing the best way to do this in another thread related to pull request https://github.com/numpy/numpy/pull/143 (which implements similar functionality, plus a lot more, for a genfromtxt-like function). So don't be surprised if the loadtxt patch comes back later, in a completely revised form… Cheers, Derek -- Derek Homeier Centre de Recherche Astrophysique de Lyon ENS Lyon 46, Allée d'Italie 69364 Lyon Cedex 07, France +33 1133 47272-8894 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] load from text files Pull Request Review
On 30.08.2011, at 6:21PM, Chris.Barker wrote: I've submitted a pull request for a new method for loading data from text files into a record array/masked record array. Click on the link for more info, but the general idea is to create a regular expression for what entries should look like and loop over the file, updating the regular expression if it's wrong. Once the types are determined the file is loaded line by line into a pre-allocated numpy array. nice stuff. Have you looked at my accumulator class, rather than pre-allocating? Less the class itself than that ideas behind it. It's easy enough to do, and would keep you from having to run through the file twice. The cost of memory re-allocation as the array grows is very small. I've posted the code recently, but let me know if you want it again. I agree it would make a very nice addition, and could complement my pre-allocation option for loadtxt - however there I've also been made aware that this approach breaks streamed input etc., so the buffer.resize(…) methods in accumulator would be the better way to go. For load table this is not quite as straightforward, though, because the type auto-detection, strictly done, requires to scan the entire input, because a column full of int could still produce a float in the last row… I'd say one just has to accept that this kind of auto-detection is incompatible with input streams, and with the necessity to scan the entire data first anyway, pre-allocating the array makes sense as well. For better consistency with what people have likely got used to from npyio, I'd recommend some minor changes: make spaces the default delimiter enable automatic decompression (given the modularity, could you simply use np.lib._datasource.open() like genfromtxt?) Cheers, Derek -- Derek Homeier Centre de Recherche Astrophysique de Lyon ENS Lyon 46, Allée d'Italie 69364 Lyon Cedex 07, France +33 1133 47272-8894 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] load from text files Pull Request Review
On 02.09.2011, at 5:50PM, Chris.Barker wrote: hmmm -- it seems you could jsut as well be building the array as you go, and if you hit a change in the imput, re-set and start again. In my tests, I'm pretty sure that the time spent file io and string parsing swamp the time it takes to allocate memory and set the values. So there is little cost, and for the common use case, it would be faster and cleaner. There is a chance, of course, that you might have to re-wind and start over more than once, but I suspect that that is the rare case. I still haven't studied your class in detail, but one could probably actually just create a copy of the array read in so far, e.g. changing it from a dtype=[('f0', 'i8'), ('f1', 'f8')] to dtype=[('f0', 'f8'), ('f1', 'f8')] as required - or even first implement it as a list or dict of arrays, that could be individually changed and only create a record array from that at the end. The required copying and extra memory use would definitely pale compared to the text parsing or the current memory usage for the input list. In my loadtxt version [https://github.com/numpy/numpy/pull/144] just parsing the text for comment lines adds ca. 10% time, while any of the array allocation and copying operations should at most be at the 1% level. enable automatic decompression (given the modularity, could you simply use np.lib._datasource.open() like genfromtxt?) I _think_this would benefit from a one-pass solution as well -- so you don't need to de-compress twice. Absolutely; on compressed data the time for the extra pass jumps up to +30-50%. Cheers, Derek -- Derek Homeier Centre de Recherche Astrophysique de Lyon ENS Lyon 46, Allée d'Italie 69364 Lyon Cedex 07, France +33 1133 47272-8894 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] load from text files Pull Request Review
On 02.09.2011, at 6:16PM, Christopher Jordan-Squire wrote: I hadn't thought of that. Interesting idea. I'm surprised that completely resetting the array could be faster. I had experimented a bit with the fromiter function, which also increases the output array as needed, and this creates negligible overhead compared to parsing the text input (it is implemented in C, though, I don't know how the .resize() calls would compare to that; and unfortunately it's for 1D-arrays only). In my tests, I'm pretty sure that the time spent file io and string parsing swamp the time it takes to allocate memory and set the values. In my tests, at least for a medium sized csv file (about 3000 rows by 30 columns), about 10% of the time was determine the types in the first read through and 90% of the time was sticking the data in the array. This would be consistent with my experience (basically testing for comment characters and the length of line.split(delimiter) in the first pass). However, that particular test took more time for reading in because the data was quoted (so converting '3,25' to a float took between 1.5x and 2x as long as '3.25') and the datetime conversion is costly. Regardless, that suggests making the data loading faster is more important than avoiding reading through the file twice. I guess that intuition probably breaks if the data doesn't fit until memory, though. But I haven't worked with extremely large data files before, so I'd appreciate refutation/confirmation of my priors. The lion's share in the data loading time, by my experience, is still the string operations (like the comma conversion you quote above), so I'd always expect any subsequent manipulations of the numpy array data to be very fast compared to that. Maybe this changes slightly with more complex data types like string records or datetime instances, but as you indicate, even for those the conversion seems to dominate the cost. Cheers, Derek -- Derek Homeier Centre de Recherche Astrophysique de Lyon ENS Lyon 46, Allée d'Italie 69364 Lyon Cedex 07, France +33 1133 47272-8894 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] saving groups of numpy arrays to disk
On 25.08.2011, at 8:42PM, Chris.Barker wrote: On 8/24/11 9:22 AM, Anthony Scopatz wrote: You can use Python pickling, if you do *not* have a requirement for: I can't recall why, but it seem pickling of numpy arrays has been fragile and not very performant. Hmm, the pure Python version might be, but, I've used cPickle for a long time and never noted any stability problems. And it is still noticeably faster than pytables, in my experience. Still, for the sake of a standardised format I'd go with HDF5 any time now (and usually prefer h5py now when starting anything new - my pytables implementation mentioned above likely is not the most efficient compared to cPickle). But with the usual disclaimers, you should be able to simply use cPickle as a drop-in replacement in the example below. Cheers, Derek On 21.08.2011, at 2:24PM, Pauli Virtanen wrote: You can use Python pickling, if you do *not* have a requirement for: - real persistence, i.e., being able to easily read the data years later - a standard data format - access from non-Python programs - safety against malicious parties (unpickling can execute some code in the input -- although this is possible to control) then you can use Python pickling: import pickle file = open('out.pck', 'wb') pickle.dump(file, tree, protocol=pickle.HIGHEST_PROTOCOL) file.close() file = open('out.pck', 'rb') tree = pickle.load(file) file.close() ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] array_equal and array_equiv comparison functions for structured arrays
Hi, as the subject says, the array_* comparison functions currently do not operate on structured/record arrays. Pull request https://github.com/numpy/numpy/pull/146 implements these comparisons. There are two commits, differing in their interpretation whether two arrays with different field names, but identical data, are equivalent; i.e. res = array_equiv(array((1,2), dtype=[('i','i4'),('v','f8')]), array((1,2), dtype=[('n','i4'),('f','f8')])) is True in the current HEAD, but False in its parent. Feedback and additional comments are invited. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Trim a numpy array in numpy.
Hi Hongchun, On 16 Aug 2011, at 23:19, Hongchun Jin wrote: I have a question regarding how to trim a string array in numpy. import numpy as np x = np.array(['aaa.hdf', 'bbb.hdf', 'ccc.hdf', 'ddd.hdf']) I expect to trim a certain part of each element in the array, for example '.hdf', giving me ['aaa', 'bbb', 'ccc', 'ddd']. Of course, I can do a loop thing. However, in my actual dataset, I have more than one million elements in such an array. So I am wondering is there a faster and better way to do it, like STRMID function in IDL? I try to google it, but it turns out that I can not find any discussion about it. Thanks. For a case like above, if you really have all constant length strings and want to truncate to a fixed length, you could simply do x.astype('|S3') For more complex cases like trimming regex patterns I can't think of a numpy solution right now, coding the loop in cython might be a better bet there... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Trim a numpy array in numpy.
On 16 Aug 2011, at 23:51, Hongchun Jin wrote: Thanks Derek for the quick reply. But I am sorry, I did not make it clear in my last email. Assume I have an array like ['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' ..., 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'] I need to get the sub-string for date and time, for example, '2008-01-31T23-56-35ZD' in the middle of each element. In more general cases, the sub-string could be any part of the string in such an array. I hope to assign the start and stop of the sub-string when I am subsetting it. Well, maybe I was a bit too quick in my reply - see the documentation for np.char for some vectorized array operations that might be of use. Unfortunately, operations like 'lstrip' and 'rstrip' don't do exactly what you might them expect to, but you could use for example np.char.split(x,'.') to create an array of lists of substrings and then deal with them; something like removing the '.hdf' suffix would already require a somewhat lengthy recursion: np.char.rstrip(np.char.rstrip(np.char.rstrip(np.char.rstrip(x, 'f'), 'd'), 'h'), '.') To also remove the leading substring in your case clearly would lead to a very clumsy expression... It turns out however, something like the above for a similar test case with a length 10 array takes about 3 times longer than the np.char.split() way; but even that is slower than a direct loop over string functions: In [6]: %timeit -n 10 y = np.char.split(x, '.') 10 loops, best of 3: 188 ms per loop In [7]: %timeit -n 10 y = np.char.split(x, '.'); z = np.fromiter( (l[1] for l in y), dtype='|S3', count=x.shape[0]) 10 loops, best of 3: 218 ms per loop In [8]: %timeit -n 10 z = np.fromiter( (l.split('.')[1] for l in x), dtype='|S3', count=x.shape[0]) 10 loops, best of 3: 143 ms per loop So it seems all of the vectorization in np.char is not that great after all (and the direct loop might still be acceptable for 1.e6 elements...)! Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Efficient way to load a 1Gb file?
On 10 Aug 2011, at 19:22, Russell E. Owen wrote: A coworker is trying to load a 1Gb text data file into a numpy array using numpy.loadtxt, but he says it is using up all of his machine's 6Gb of RAM. Is there a more efficient way to read such text data files? The npyio routines (loadtxt as well as genfromtxt) first read in the entire data as lists, which creates of course significant overhead, but is not easy to circumvent, since numpy arrays are immutable - so you have to first store the numbers in some kind of mutable object. One could write a custom parser that tries to be somewhat more efficient, e.g. first reading in sub-arrays from a smaller buffer. Concatenating those sub-arrays would still require about twice the memory of the final array. I don't know if using the array.array type (which is mutable) is much more efficient than a list... To really avoid any excess memory usage you'd have to know the total data size in advance - either by reading in the file in a first pass to count the rows, or explicitly specifying it to a custom reader. Basically, assuming a completely regular file without missing values etc., you could then read in the data like X = np.zeros((n_lines, n_columns), dtype=float) delimiter = ' ' for n, line in enumerate(file(fname, 'r')): X[n] = np.array(line.split(delimiter), dtype=float) (adjust delimiter and dtype as needed...) HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Efficient way to load a 1Gb file?
On 10 Aug 2011, at 22:03, Gael Varoquaux wrote: On Wed, Aug 10, 2011 at 04:01:37PM -0400, Anne Archibald wrote: A 1 Gb text file is a miserable object anyway, so it might be desirable to convert to (say) HDF5 and then throw away the text file. +1 There might be concerns about ensuring data accessibility agains throwing the text file away, but converting to HDF5 would be an elegant for reading in without the memory issues, too (I must confess though, I've regularly read ~ 1GB ASCII files into memory - with decent virtual memory management it did not turn out too bad...) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Cython 0.15
On 7 Aug 2011, at 04:09, Sturla Molden wrote: Den 06.08.2011 11:18, skrev Dag Sverre Seljebotn: We are excited to announce the release of Cython 0.15, which is a huge step forward in achieving full Python language coverage as well as many new features, optimizations, and bugfixes. This is really great. With Cython progressing like this, I might soon have written my last line of Fortran. :-) +1 (except the bit about writing Fortran, probably ;-) I am only getting 4 errors with Python 3.1 + 3.2 (Mac OS X 10.6/x86_64): compiling (cpp) and running numpy_bufacc_T155, numpy_cimport, numpy_parallel, numpy_test... I could not find much documentation about the runtests.py script (like how to figure out the exact gcc command used), but I am happy to send more details wherever requested. Adding a '-v' flag prints the following additional info: numpy_bufacc_T155.c: In function ‘PyInit_numpy_bufacc_T155’: numpy_bufacc_T155.c:3652: warning: ‘return’ with no value, in function returning non-void .numpy_bufacc_T155.cpp: In function ‘PyObject* PyInit_numpy_bufacc_T155()’: numpy_bufacc_T155.cpp:3652: error: return-statement with no value, in function returning ‘PyObject*’ Enumpy_cimport.c: In function ‘PyInit_numpy_cimport’: numpy_cimport.c:3327: warning: ‘return’ with no value, in function returning non-void .numpy_cimport.cpp: In function ‘PyObject* PyInit_numpy_cimport()’: numpy_cimport.cpp:3327: error: return-statement with no value, in function returning ‘PyObject*’ Enumpy_parallel.c: In function ‘PyInit_numpy_parallel’: numpy_parallel.c:3824: warning: ‘return’ with no value, in function returning non-void .numpy_parallel.cpp: In function ‘PyObject* PyInit_numpy_parallel()’: numpy_parallel.cpp:3824: error: return-statement with no value, in function returning ‘PyObject*’ Enumpy_test.c: In function ‘PyInit_numpy_test’: numpy_test.c:11611: warning: ‘return’ with no value, in function returning non-void .numpy_test.cpp: In function ‘PyObject* PyInit_numpy_test()’: numpy_test.cpp:11611: error: return-statement with no value, in function returning ‘PyObject*’ This happens with numpy 1.5.1, 1.6.0, 1.6.1 or git master installed, With Python 2.5-2.7 all 5536 tests are passing! Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Cython 0.15
On 7 Aug 2011, at 22:31, Paul Anton Letnes wrote: Looks like you have done some great work! I've been using f2py in the past, but I always liked the idea of cython - gradually wrapping more and more code as the need arises. I read somewhere that fortran wrapping with cython was coming - dare I ask what the status on this is? Is it a goal for cython to support easy fortran wrapping at all? Don't know if there is one besides fwrap, but http://pypi.python.org/pypi/fwrap/0.1.1 builds and tests OK on python 2.[5-7]. So I am bound to continue my Fortran writing... Keep up the good work! Absolutely agreed! Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Cython 0.15
On 7 Aug 2011, at 23:27, Dag Sverre Seljebotn wrote: Enumpy_test.c: In function ‘PyInit_numpy_test’: numpy_test.c:11611: warning: ‘return’ with no value, in function returning non-void .numpy_test.cpp: In function ‘PyObject* PyInit_numpy_test()’: numpy_test.cpp:11611: error: return-statement with no value, in function returning ‘PyObject*’ This happens with numpy 1.5.1, 1.6.0, 1.6.1 or git master installed, With Python 2.5-2.7 all 5536 tests are passing! I believe this is http://projects.scipy.org/numpy/ticket/1919 Can you confirm? I don't think there's anything we can do on the Cython end to fix this, if the report is correct. Yes, the proposed patch fixes the errors! I have added a comment to the ticket, hopefully this can be merged soon. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] longlong format error with Python = 2.6 in scalartypes.c
Hi, commits c15a807e and c135371e (thus most immediately addressed to Mark, but I am sending this to the list hoping for more insight on the issue) introduce a test failure with Python 2.5+2.6 on Mac: FAIL: test_timedelta_scalar_construction (test_datetime.TestDateTime) -- Traceback (most recent call last): File /Users/derek/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py, line 219, in test_timedelta_scalar_construction assert_equal(str(np.timedelta64(3, 's')), '3 seconds') File /Users/derek/lib/python2.6/site-packages/numpy/testing/utils.py, line 313, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: '%lld seconds' DESIRED: '3 seconds' due to the lld format passed to PyUString_FromFormat in scalartypes.c. In the current npy_common.h I found the comment * in Python 2.6 the %lld formatter is not supported. In this * case we work around the problem by using the %zd formatter. though I did not notice that problem when I cleaned up the NPY_LONGLONG_FMT definitions in that file (and it is not entirely clear whether the comment only pertains to Windows...). Anyway changing the formatters in scalartypes.c to zd as well removes the failure and still works with Python 2.7 and 3.2 (at least on Mac OS). However I am wondering if a) NPY_[U]LONGLONG_FMT should also be defined conditional to the Python version (and if %zu is a valid formatter), and b) scalartypes.c should use NPY_LONGLONG_FMT from npy_common.h I am attaching a patch implementing a), but only the quick and dirty solution to b). Cheers, Derek npy_longlong_fmt.patch Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Segmentation Fault in Numpy.test()
On 2 Aug 2011, at 18:57, Thomas Markovich wrote: It appears that uninstalling python 2.7 and installing the scipy superpack with the apple standard python removes the Did the superpack installer automatically install numpy to the python2.7 directory when present? Even if so, I reckon you could simply reinstall python2.7 after the numpy installation (still calling python2.6 to use numpy of course...). segfaulting behavior from numpy. Now it appears that just scipy is segfaulting at test test_arpack.test_hermitian_modes(True, std- hermitian, 'F', 2, 'SM', None, 0.5, function aslinearoperator at 0x1043b1848) ... Segmentation fault Which architecture is this? Being on Snow Leopard, probably x86_46... I remember encountering similar problems on PPC, which I suspect are related to stability issues with Apple's Accelerate framework. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice
On 2 Aug 2011, at 19:15, Christopher Barker wrote: In [32]: s = numpy.array(a, dtype=tfc_dtype) --- TypeError Traceback (most recent call last) /Users/cbarker/ipython console in module() TypeError: expected a readable buffer object OK -- I can see why you'd expect that to work. However, the trick with structured dtypes is that the dimensionality of the inputs can be less than obvious -- you are passing in a 1-d list of 4 numbers -- do you want a 1-d array? or ? -- in this case, it's pretty obvious (as a human) what you would want -- you have a dtype with four fields, and you're passing in four numbers, but there are so many possible combinations that numpy doesn't try to be smart about it. So as a rule, you need to be quite specific when working with structured dtypes. However, the default is for numpy to map tuples to dtypes, so if you pass in a tuple instead, it works: In [34]: t = tuple(a) In [35]: s = numpy.array(t, dtype=tfc_dtype) In [36]: s Out[36]: array((32000L, 0.789131, 0.00805999, 3882.22), dtype=[('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')]) you were THIS close! Thanks for the detailed discussion! BTW this works also without explicitly converting the words one by one: In [1]: l = ' 32000 7.89131E-01 8.05999E-03 3.88222E+03' In [2]: tfc_dtype = numpy.dtype([('nps', 'u8'), ('t', 'f8'), ('e', 'f8'),('fom', 'f8')]) In [3]: numpy.array(tuple(l.split()), dtype=tfc_dtype) Out[3]: array((32000L, 0.789131, 0.00805999, 3882.22), dtype=[('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')]) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Can I index array starting with 1?
On 29.07.2011, at 1:19AM, Stéfan van der Walt wrote: On Thu, Jul 28, 2011 at 4:10 PM, Anne Archibald aarch...@physics.mcgill.ca wrote: Don't forget the everything-looks-like-a-nail approach: make all your arrays one bigger than you need and ignore element zero. Hehe, why didn't I think of that :) I guess the kind of problem I struggle with more frequently is books written with summations over -m to +n. In those cases, it's often convenient to use the mapping function, so that I can enter the formulas as they occur. I don't want to open any cans of worms at this point, but given that Fortran90 supports such indexing (arbitrary limits, including negative ones), there definitely are use cases for it (or rather, instances where it is very convenient at least, like in Stéfan's books). So I am wondering how much it would take to implement such an enhancement for the standard ndarray... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Can I index array starting with 1?
On 29.07.2011, at 1:38AM, Anne Archibald wrote: The can is open and the worms are everywhere, so: The big problem with one-based indexing for numpy is interpretation. In python indexing, -1 is the last element of the array, and ranges have a specific meaning. In a hypothetical one-based indexing scheme, would the last element be element 0? if not, what does looking up zero do? What about ranges - do ranges still include the first endpoint and not the second? I suppose one could choose the most pythonic of the 1-based conventions, but do any of them provide from-the-end indexing without special syntax? I forgot, this definitely needs to be preserved for ndarray! Once one had decided what to do, implementation would be pretty easy - just make a subclass of ndarray that replaces the indexing function. In fact, Stéfan's reshuffling trick does nearly everything I would expect for using negative indices, maybe the only functionality needed to implement is 1. define an attribute like x.start that could tell appropriate functions (e.g. for print(x) or plot(x)) the zero-point, so x would be evaluated e.g. at x[-5], wrapping around at [x-1], x[0] to x[-6]... Should have the advantage that anything that's not yet aware of this attribute could simply ignore it. 2. allow to automatically set this starting point when creating something like x = np.zeros(-5:7) or setting a shape to (-5:7) - but maybe the latter is leading into very dangerous territory already... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2
On 07.07.2011, at 7:16PM, Robert Pyle wrote: .../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1922: RuntimeWarning: invalid value encountered in absolute return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) Everything else completes with 3 KNOWNFAILs and 1 SKIP. This warning is not new to this release; I've seen it before but haven't tried tracking it down until today. It arises in allclose(). The comments state If either array contains NaN, then False is returned. but no test for NaN is done, and NaNs are indeed what cause the warning. Inserting if any(isnan(x)) or any(isnan(y)): return False before current line number 1916 in numeric.py seems to fix it. The same warning is still present in the current master, I just never paid attention to it because the tests still pass (it does correctly identify NaNs because they are not less_equal the tolerance), but of course this should be properly fixed as you suggest. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] broacasting question
On 30.06.2011, at 7:32PM, Thomas K Gamble wrote: I'm trying to convert some IDL code to python/numpy and i'm having some trouble understanding the rules for boradcasting during some operations. example: given the following arrays: a = array((2048,3577), dtype=float) b = array((256,25088), dtype=float) c = array((2048,3136), dtype=float) d = array((2048,3136), dtype=float) do: a = b * c + d In IDL, the computation is done without complaint and all array sizes are preserved. In ptyhon I get a value error concerning broadcasting. I can force it to work by taking slices, but the resulting size would be a = (256x3136) rather than (2048x3577). I admit that I don't understand IDL (or python to be honest) well enough to know how it handles this to be able to replicate the result properly. Does it only operate on the smallest dimensions ignoring the larger indices leaving their values unchanged? Can someone explain this to me? If IDL does such operations silently I'd probably rather be troubled about it... Assuming you actually meant something like a = np.ndarray((2048,3577)) (because np.array((2048,3577), dtype=float) would simply be the 2-vector [ 2048. 3577.]), the shape of a indeed matches in no way the other ones. While b,c,d do have the same total size, thus something like b.reshape((2048,3136) * c + d will work, meaning the first 8 rows of b b[:8] would be concatenated to the first row of the output, and so on... Since the total size is still smaller than a, I could only venture something is done like np.add(b.reshape(2048,3136) * c, d, out=a[:,:3136]) But to say whether this is really the equivalent result to what IDL does, one would have to study the IDL manual in detail or directly compare the output (e.g. check what happens to the values in a[:,3136:]...) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] broacasting question
On 30.06.2011, at 11:57PM, Thomas K Gamble wrote: np.add(b.reshape(2048,3136) * c, d, out=a[:,:3136]) But to say whether this is really the equivalent result to what IDL does, one would have to study the IDL manual in detail or directly compare the output (e.g. check what happens to the values in a[:,3136:]...) Cheers, Derek Your post gave me the cluse I needed. I had my shapes slightly off in the example I gave, but if I try: a = reshape(b.flatten('F') * c.flatten('F') + d.flatten('F'), b.shape, order='F') I get a result in line with the IDL result. Another example with different total size arrays: b = np.ndarray((2048,3577), dtype=float) c = np.ndarray((256,25088), dtype=float) a= reshape(b.flatten('F')[:size(c)]/c.flatten('F'), c.shape, order='F') This also gives a result like that of IDL. Right, I forgot to point out that there are at least 2 ways to bring the arrays into compatible shapes (that's the reason broadcasting does not work here, because numpy only does automatic broadcasting if there is an unambiguous way to do so). So the IDL arrays being Fortran-ordered is the essential bit of information here. Just two remarks: I. Assigning a = reshape(b.flatten('F')[:size(c)]/c.flatten('F'), c.shape, order='F') as above will create a new array of shape c.shape - if you wanted to put your results into an existing array of shape(2048,3577), you'd still have to explicitly say a[:,:3136] = ... II. The flatten() operations and the assignment above all create full copies of the arrays, thus the np.add ufunc above together with simple reshape operations might improve performance somewhat - however keeping the Fortran order also requires some costly transpositions, as for your last example a = np.divide(b.T[:3136].reshape(c.T.shape).T, c, out=a) so YMMV... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast numpy i/o
On 27.06.2011, at 6:36PM, Robert Kern wrote: Some late comments on the note (I was a bit surprised that HDF5 installation seems to be a serious hurdle to many - maybe I've just been profiting from the fink build system for OS X here - but I also was not aware that the current netCDF is built on downwards-compatibility to the HDF5 standard, something useful learnt again...:-) It's not so much that it's hard to build for lots of people. Rather, it would be quite difficult to include into numpy itself, particularly if we are just relying on distutils. numpy is too fundamental of a package to have extra dependencies. I completely agree with that - I doubt that there is a standard data format that is both in wide use and simple enough to be easily included with numpy. Pyfits might be possible, but that again addresses probably a too specific user group. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast numpy i/o
On 21.06.2011, at 7:58PM, Neal Becker wrote: I think, in addition, that hdf5 is the only one that easily interoperates with matlab? speaking of hdf5, I see: pyhdf5io 0.7 - Python module containing high-level hdf5 load and save functions. h5py 2.0.0 - Read and write HDF5 files from Python Any thoughts on the relative merits of these? In my experience, HDF5 access usually approaches disk access speed, and random access to sub-datasets should be significantly faster than reading in the entire file, though I have not been able to test this. I have not heard about pyhdf5io (how does it work together with numpy?) - as alternative to h5py I'd rather recommend pytables, though I prefer the former for its cleaner/simpler interface (but that probably depends on your programming habits). HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] npyio - gzip 271 Error -3 while decompressing ?
Moin Denis, On 20 Jun 2011, at 19:04, denis wrote: a separate question, have you run genfromtxt( xx.csv.gz ) lately ? I haven't, and I was not particularly involved with it before this patch, so this would possibly be better addressed to the list. On on .../scikits.learn-0.8/scikits/learn/datasets/data.digits.csv.gz numpy 1.6.0, py 2.6 mac I get X = np.genfromtxt( filename, delimiter=, ) File /Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/site-packages/numpy/lib/npyio.py, line 1271, in genfromtxt first_line = fhd.next() File /Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/gzip.py, line 438, in next line = self.readline() File /Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/gzip.py, line 393, in readline c = self.read(readsize) File /Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/gzip.py, line 219, in read self._read(readsize) File /Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/gzip.py, line 271, in _read uncompress = self.decompress.decompress(buf) zlib.error: Error -3 while decompressing: invalid distance too far back It would be nice to fix this too, if it hasn't been already. Btw the file gunzips fine. I could reproduce that error for the gzip'ed csv files in that directory; it can be isolated to the underlying gzip call above - fhd = gzip.open('digits.csv.gz', 'rbU'); fhd.next() produces the same error for these files with all python2.x versions on my Mac, but not with python3.x. Also only with the 'U' mode specified, yet the same mode is parsing other .gz files just fine. I could not really track down what the 'U' flag is doing in gzip.py, but I assume it is specifying some kind of unbuffered read. Also it's a mystery to me what is different in those files that would trigger the error. I even read them in with loadtxt() and wrote them back using constant line width and/or spaces as separators, still producing the same exception. The obvious place to fix this (or work around a bug in python2's gzip.py, whatever), would be changing the open command in genfromtxt fhd = iter(np.lib._datasource.open(fname, 'rbU')) to omit the 'U' at least with python2. Alternatively one could do a test read and catch the exception, to then re-open the file with mode 'rb'... Pierre, if you are reading this, can you comment how important the 'U' is for performance considerations or such? HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 31 May 2011, at 21:28, Pierre GM wrote: On May 31, 2011, at 6:37 PM, Derek Homeier wrote: On 31 May 2011, at 18:25, Pierre GM wrote: On May 31, 2011, at 5:52 PM, Derek Homeier wrote: I think stuff like multiple delimiters should have been dealt with before, as the right place to insert the ndmin code (which includes the decision to squeeze or not to squeeze as well as to add additional dimensions, if required) would be right at the end before the 'unpack' switch, or rather replacing the bit: if usemask: output = output.view(MaskedArray) output._mask = outputmask if unpack: return output.squeeze().T return output.squeeze() But there it's already not clear to me how to deal with the MaskedArray case... Oh, easy. You need to replace only the last three lines of genfromtxt with the ones from loadtxt (808-833). Then, if usemask is True, you need to use ma.atleast_Xd instead of np.atleast_Xd. Et voilà. Comments: * I would raise an exception if ndmin isn't correct *before* trying to read the file... * You could define a `collapse_function` that would be `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on the values of `usemask` and `ndmin`... Thanks, that helped to clean up a little bit. If you have any question about numpy.ma, don't hesitate to contact me directly. Thanks for the directions! I was not sure about the usemask case because it presently does not invoke .squeeze() either... The idea is that if `usemask` is True, you build a second array (the mask), that you attach to your main array at the very end (in the `output=output.view(MaskedArray), output._mask = mask` combo...). Afterwards, it's a regular MaskedArray that supports the .squeeze() method... OK, in both cases output.squeeze() is now used if ndimndmin and usemask is False - at least it does not break any tests, so it seems to work with MaskedArrays as well. On a possibly related note, genfromtxt also treats the 'unpack'ing of structured arrays differently from loadtxt (which returns a list of arrays in that case) - do you know if this is on purpose, or also rather missing functionality (I guess it might break recfromtxt()...)? Keep in mind that I haven't touched genfromtxt since 8-10 months or so. I wouldn't be surprised that it were lagging a bit behind loadtxt in terms of development. Yes, there'll be some tweaking to do for recfromtxt (it's OK for now if `ndmin` and `unpack` are the defaults) and others, but nothing major. Well, at long last I got to implement the above and added the corresponding tests for genfromtxt - with the exception of the dimension-0 cases, since genfromtxt raises an error on empty files. There already is a comment it should perhaps better return an empty array, so I am putting that idea up for discussion here again. I tried to devise a very basic test with masked arrays, just added to test_withmissing now. I also implemented the same unpacking behaviour for structured arrays and just made recfromtxt set unpack=False to work (or should it issue a warning?). The patches are up for review as commit 8ac01636 in my iocnv-wildcard branch: https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numpy 1.6.1 release candidate 1
Hi Ralf, On 19 Jun 2011, at 12:28, Ralf Gommers wrote: numpy.test('full') Running unit tests for numpy NumPy version 1.6.1rc1 NumPy is installed in /sw/lib/python3.2/site-packages/numpy Python version 3.2 (r32:88445, Mar 1 2011, 18:28:16) [GCC 4.0.1 (Apple Inc. build 5493)] nose version 1.0.0 ... == FAIL: test_return_character.TestF77ReturnCharacter.test_all -- Traceback (most recent call last): File /sw/lib/python3.2/site-packages/nose/case.py, line 188, in runTest self.test(*self.arg) File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ test_return_character.py, line 78, in test_all self.check_function(getattr(self.module, name)) File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ test_return_character.py, line 12, in check_function r = t(array('ab'));assert_( r==asbytes('a'),repr(r)) File /sw/lib/python3.2/site-packages/numpy/testing/utils.py, line 34, in assert_ raise AssertionError(msg) AssertionError: b' ' == FAIL: test_return_character.TestF90ReturnCharacter.test_all -- Traceback (most recent call last): File /sw/lib/python3.2/site-packages/nose/case.py, line 188, in runTest self.test(*self.arg) File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ test_return_character.py, line 136, in test_all self.check_function(getattr(self.module.f90_return_char, name)) File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ test_return_character.py, line 12, in check_function r = t(array('ab'));assert_( r==asbytes('a'),repr(r)) File /sw/lib/python3.2/site-packages/numpy/testing/utils.py, line 34, in assert_ raise AssertionError(msg) AssertionError: b' ' (both errors are raised on the first function tested, t0). Could you open a ticket for these? If it's f2py + py3.2 + ppc only I'd like to ignore them for 1.6.1. and 3.1, but I agree. http://projects.scipy.org/numpy/ticket/1871 Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt converter question
On 18 Jun 2011, at 04:48, gary ruben wrote: Thanks guys - I'm happy with the solution for now. FYI, Derek's suggestion doesn't work in numpy 1.5.1 either. For any developers following this thread, I think this might be a nice use case for genfromtxt to handle in future. Numpy 1.6.0 and above is handling it in the present. As a corollary of this problem, I wonder whether there's a human-readable text format for complex numbers that genfromtxt can currently easily parse into a complex array? Having the hard-coded value for the number of columns in the converter and the genfromtxt call goes against the philosophy of the function's ability to form an array of shape matching the input layout. genfromtxt and (for regular data such as yours) loadtxt can also parse the standard '1.0+3.14j ...' format practically automatically as long as you specify 'dtype=complex' (and appropriate delimiter, if required). loadtxt does not handle dtype=np.complex64 or np.complex256 at this time due to a bug in the default converters; I have just created a patch for that. In principle genfromtxt and loadtxt in numpy 1.6 can also handle cases similar to your input, but you need to change the tuples at least to contain no spaces - '( -1.4249, 1.7330)' - '(-1.4249,+1.7330)' or additionally insert another delimiter like ';' - otherwise it's hopeless to correctly infer the number of columns. With such input, np.genfromtxt(a, converters=cnv, dtype=complex) works as expected, and I have also just created a patch that would allow users to more easily specify a default converter for all input data rather than constructing one for every row. A wider-reaching automatic detection of complex formats might be feasible for genfromtxt, but I'd suspect it could create quite some overhead, as I can think of at least two formats that should probably be considered - tuples as above, and pairs of 'real,imag' without parentheses. But feel free to create a ticket if you think this would be an important enhancement. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Enhancements and bug fix for npyio
Hi, in my experience numbers loaded from text files very often require identical conversion for all data (e.g. parsing Fortran-style double precision, or German vs. English decimals...). Yet loadtxt and genfromtxt in such cases require the user to construct a dictionary of converters for every single data column, a task that could be easily handled within the routines. I am therefore putting up a patch for testing that allows users to specify a default converter for all input data as converters={-1: conv} (or alternatively {'all': conv}, leaving that open to discussion). Please test the changeset (still lacking tests for the new option) at https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard which also contains some bug fixes to handle complex input. Awaiting your comments, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
On 17.06.2011, at 8:05PM, Mark Wiebe wrote: On Thu, Jun 16, 2011 at 8:18 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 17.06.2011, at 2:02AM, Mark Wiebe wrote: ok, that was a lengthy hunt, but it's in printing the string in make_iso_8601_date: tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year); fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, dts-year, substr); produces np.datetime64('1970-03-23 20:00:00Z', 'D') printed 4[62]: dts-year=1970: numpy.datetime64('-03-23','D') It seems snprintf is not using the correct format for INT64 (as I happened to do in fprintf before realising I had to use %lld ;-) - could it be this is a general issue, which just does not show up on little-endian machines because they happen to pass the right half of the int64 to printf? BTW, how is this supposed to be handled (in 4 digits) if the year is indeed beyond the 32bit range (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to int32 before print. I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it if it doesn't work... What is NumPy setting it to for that platform? Of course (just felt somewhat lost among all the #defines). It clearly seems to be mis-constructed on PowerPC 32: NPY_SIZEOF_LONG is 4, thus NPY_INT64_FMT is set to NPY_LONGLONG_FMT - Ld, but this does not seem to handle int64 on big-endian Macs - explicitly printing %Ld, dts-year also produces 0. Changing the snprintf format to %04 lld produces the correct output, so if nothing else avails, I suggest to put something like # elseif (defined(__ppc__) || defined(__ppc64__)) #define LONGLONG_FMT lld #define ULONGLONG_FMT llu # else into npy_common.h (or possibly simply defined(__APPLE__), since %lld seems to work on 32bit i386 Macs just as well). Probably a minimally invasive change is best, also this kind of thing deserves a comment explaining the problem that was encountered with the specific platforms, so that in the future when people examine this part they can understand why this is there. Do you want to make a pull request for this change? I'd go with the defined(__APPLE__) then, since %Ld produces wrong results on both 32bit platforms. More precisely, this print %Ld - %Ld, dts-year, dts-year produces 0 - 1970 on ppc and 1970 - 0 on i386, while %lld - %lld prints 1970 - 1970 on both archs. There still is an issue (I now remember this came up with a different test a few months ago), that none of the formats seems to be able to actually print numbers 2**32 (or 2**31, don't remember), but this seemed out of reach for anyone on this list. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt converter question
Hi Gary, On 17.06.2011, at 5:39PM, gary ruben wrote: Thanks for the hints Olivier and Bruce. Based on them, the following is a working solution, although I still have that itchy sense that genfromtxt should be able to do it directly. import numpy as np from StringIO import StringIO a = StringIO('''\ (-3.9700,-5.0400) (-1.1318,-2.5693) (-4.6027,-0.1426) (-1.4249, 1.7330) (-5.4797, 0.) ( 1.8585,-1.5502) ( 4.4145,-0.7638) (-0.4805,-1.1976) ( 0., 0.) ( 6.2673, 0.) (-0.4504,-0.0290) (-1.3467, 1.6579) ( 0., 0.) ( 0., 0.) (-3.5000, 0.) ( 2.5619,-3.3708) ''') b = np.genfromtxt(a, dtype=str, delimiter=18)[:,:-1] b = np.vectorize(lambda x: complex(*eval(x)))(b) print b It should, I think you were very close in your earlier attempt: On Sat, Jun 18, 2011 at 12:31 AM, Bruce Southey bsout...@gmail.com wrote: On 06/17/2011 08:51 AM, Olivier Delalleau wrote: 2011/6/17 Bruce Southey bsout...@gmail.com On 06/17/2011 08:22 AM, gary ruben wrote: Thanks Olivier, Your suggestion gets me a little closer to what I want, but doesn't quite work. Replacing the conversion with c = lambda x:np.cast[np.complex64](complex(*eval(x))) b = np.genfromtxt(a,converters={0:c, 1:c, 2:c, 3:c},dtype=None,delimiter=18,usecols=range(4)) produces [[(-3.9702861-5.0396185j) (-1.1318000555-2.56929993629j) (-4.60270023346-0.14259905j) (-1.42490005493+1.7334005j)] [(-5.4797000885+0j) (1.8585381-1.5501999855j) (4.41450023651-0.763800024986j) (-0.480500012636-1.1976706j)] [0j (6.26730012894+0j) (-0.4503485-0.028991655j) (-1.34669995308+1.65789997578j)] [0j 0j (-3.5+0j) (2.56189990044-3.37080001831j)]] which is not yet an array of complex numbers. It seems close to the solution though. You were just overdoing it by already creating an array with the converter, this apparently caused genfromtxt to create a structured array from the input (which could be converted back to an ndarray, but that can prove tricky as well) - similar, if you omit the dtype=None. The following cnv = dict.fromkeys(range(4), lambda x: complex(*eval(x))) b = np.genfromtxt(a,converters=cnv, dtype=None, delimiter=18, usecols=range(4)) directly produces a shape(4,4) complex array for me (you may have to apply an .astype(np.complex64) afterwards if so desired). BTW I think this is an interesting enough case of reading non-trivially structured data that it deserves to appear on some examples or cookbook page. HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] genfromtxt converter question
On 17.06.2011, at 11:01PM, Olivier Delalleau wrote: You were just overdoing it by already creating an array with the converter, this apparently caused genfromtxt to create a structured array from the input (which could be converted back to an ndarray, but that can prove tricky as well) - similar, if you omit the dtype=None. The following cnv = dict.fromkeys(range(4), lambda x: complex(*eval(x))) b = np.genfromtxt(a,converters=cnv, dtype=None, delimiter=18, usecols=range(4)) directly produces a shape(4,4) complex array for me (you may have to apply an .astype(np.complex64) afterwards if so desired). BTW I think this is an interesting enough case of reading non-trivially structured data that it deserves to appear on some examples or cookbook page. HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I had tried that as well and it doesn't work with numpy 1.4.1 (I get an object array). It may have been fixed in a later version. OK, I was using the current master from github, but it works in 1.6.0 as well. I still noticed some differences between loadtxt and genfromtxt behaviour, e.g. where loadtxt would be able to take a string from the converter and automatically convert it to a number, whereas in genfromtxt the converter still had to include the float() or complex()... Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
Hi Mark, On 16.06.2011, at 5:40PM, Mark Wiebe wrote: np.datetime64('2011-06-16 02:03:04Z', 'D') np.datetime64('-06-16','D') I've tried to track this down in datetime.c, but unsuccessfully so (i.e. I could not connect it to any of the dts-year assignments therein). This is definitely perplexing. Probably the first thing to check is whether it's breaking during parsing or printing. This should always produce the same result: np.datetime64('1970-03-23 20:00:00Z').astype('i8') 7070400 But maybe the test_days_creation is already checking that thoroughly enough. Then, maybe printf-ing the year value at various stages of the printing, like in set_datetimestruct_days, after convert_datetime_to_datetimestruct, and in make_iso_8601_date. This would at least isolate where the year is getting lost. ok, that was a lengthy hunt, but it's in printing the string in make_iso_8601_date: tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year); fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, dts-year, substr); produces np.datetime64('1970-03-23 20:00:00Z', 'D') printed 4[62]: dts-year=1970: numpy.datetime64('-03-23','D') It seems snprintf is not using the correct format for INT64 (as I happened to do in fprintf before realising I had to use %lld ;-) - could it be this is a general issue, which just does not show up on little-endian machines because they happen to pass the right half of the int64 to printf? BTW, how is this supposed to be handled (in 4 digits) if the year is indeed beyond the 32bit range (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to int32 before print. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
On 17.06.2011, at 2:02AM, Mark Wiebe wrote: ok, that was a lengthy hunt, but it's in printing the string in make_iso_8601_date: tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year); fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, dts-year, substr); produces np.datetime64('1970-03-23 20:00:00Z', 'D') printed 4[62]: dts-year=1970: numpy.datetime64('-03-23','D') It seems snprintf is not using the correct format for INT64 (as I happened to do in fprintf before realising I had to use %lld ;-) - could it be this is a general issue, which just does not show up on little-endian machines because they happen to pass the right half of the int64 to printf? BTW, how is this supposed to be handled (in 4 digits) if the year is indeed beyond the 32bit range (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to int32 before print. I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it if it doesn't work... What is NumPy setting it to for that platform? Of course (just felt somewhat lost among all the #defines). It clearly seems to be mis-constructed on PowerPC 32: NPY_SIZEOF_LONG is 4, thus NPY_INT64_FMT is set to NPY_LONGLONG_FMT - Ld, but this does not seem to handle int64 on big-endian Macs - explicitly printing %Ld, dts-year also produces 0. Changing the snprintf format to %04 lld produces the correct output, so if nothing else avails, I suggest to put something like # elseif (defined(__ppc__) || defined(__ppc64__)) #define LONGLONG_FMT lld #define ULONGLONG_FMT llu # else into npy_common.h (or possibly simply defined(__APPLE__), since %lld seems to work on 32bit i386 Macs just as well). Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion