[Numpy-discussion] Bug in einsum?
Hi, I have encountered a very weird behaviour with einsum. I try to compute something like R*A*R', where * denotes a kind of matrix multiplication. However, for particular shapes of R and A, the results are extremely bad. I compare two einsum results: First, I compute in two einsum calls as (R*A)*R'. Second, I compute the whole result in one einsum call. However, the results are significantly different for some shapes. My test: import numpy as np for D in range(30): A = np.random.randn(100,D,D) R = np.random.randn(D,D) Y1 = np.einsum('...ik,...kj-...ij', R, A) Y1 = np.einsum('...ik,...kj-...ij', Y1, R.T) Y2 = np.einsum('...ik,...kl,...lj-...ij', R, A, R.T) print(D=%d % D, np.allclose(Y1,Y2), np.linalg.norm(Y1-Y2)) Output: D=0 True 0.0 D=1 True 0.0 D=2 True 8.40339658678e-15 D=3 True 8.09995399928e-15 D=4 True 3.59428803435e-14 D=5 False 34.755610184 D=6 False 28.3576558351 D=7 False 41.5402690906 D=8 True 2.31709582841e-13 D=9 False 36.0161112799 D=10 True 4.76237746912e-13 D=11 True 4.5790782e-13 D=12 True 4.90302218301e-13 D=13 True 6.96175851271e-13 D=14 True 1.10067181384e-12 D=15 True 1.29095933163e-12 D=16 True 1.3466837332e-12 D=17 True 1.52265065763e-12 D=18 True 2.05407923852e-12 D=19 True 2.33327630748e-12 D=20 True 2.96849358082e-12 D=21 True 3.31063706175e-12 D=22 True 4.28163620455e-12 D=23 True 3.58951880681e-12 D=24 True 4.69973694769e-12 D=25 True 5.47385264567e-12 D=26 True 5.49643316347e-12 D=27 True 6.75132988402e-12 D=28 True 7.86435437892e-12 D=29 True 7.85453681029e-12 So, for D={5,6,7,9}, allclose returns False and the error norm is HUGE. It doesn't seem like just some small numerical inaccuracy because the error norm is so large. I don't know which one is correct (Y1 or Y2) but at least either one is wrong in my opinion. I ran the same test several times, and each time same values of D fail. If I change the shapes somehow, the failing values of D might change too, but I usually have several failing values. I'm running the latest version from github (commit bd7104cef4) under Python 3.2.3. With NumPy 1.6.1 under Python 2.7.3 the test crashes and Python exits printing Floating point exception. This seems so weird to me that I wonder if I'm just doing something stupid.. Thanks a lot for any help! Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Neal Becker wrote: Neal Becker wrote: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas. I think if I could just grab a long int from the underlying mersenne twister, through some c api? Well, this at least appears to work - probably not the most efficient approach - calls the RandomState object via the python interface to get 4 bytes at a time: int test1 (bp::object rs) { bp::str bytes = call_methodbp::str (rs.ptr(), bytes, 4); // get 4 bytes return *reinterpret_castint* (PyString_AS_STRING (bytes.ptr())); } BOOST_PYTHON_MODULE (numpy_rand) { boost::numpy::initialize(); def (test1, test1); } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] fast numpy.fromfile skipping data chunks
Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? Thanks import numpy as np #For cython! cimport numpy as np from libc.stdint cimport uint32_t def cffskip32(fid, int count=1, int skip=0): cdef int k=0 cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count, dtype=np.uint32) if skip=0: while kcount: try: data[k] = np.fromfile(fid, count=1, dtype=np.uint32) fid.seek(skip, 1) k +=1 except ValueError: data = data[:k] break return data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks
On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? If your data is stored as fixed-format binary (as it seems it is), then the easiest way is probably # Exploit the operating system's virtual memory manager to get a virtual copy of the entire file in memory # (This does not actually use any memory until accessed): virtual_arr = np.memmap(path, np.uint32, r) # Get a numpy view onto every 20th entry: virtual_arr_subsampled = virtual_arr[::20] # Copy those bits into regular malloc'ed memory: arr_subsampled = virtual_arr_subsampled.copy() (Your data is probably large enough that this will only work if you're using a 64-bit system, because of address space limitations; but if you have data that's too large to fit into memory, then I assume you're using a 64-bit system anyway...) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy reference array
Is it possible to create a numpy array which points to the same data in a different numpy array (but in different order etc)? For example: Code: -- import numpy as np a = np.arange(10) ids = np.array([0,0,5,5,9,9,1,1]) b = a[ids] a[0] = -1 b[0] #should be -1 if b[0] referenced the same data as a[0] 0 -- ctypes almost does it for me, but the access is inconvenient. I would like to access b as a regular numpy array: Code: -- import numpy as np import ctypes a = np.arange(10) ids = np.array([0,0,5,5,9,9,1,1]) b = [a[id:id+1].ctypes.data_as(ctypes.POINTER(ctypes.c_long)) for id in ids] a[0] = -1 b[0][0] #access is inconvenient -1 -- Some more information: I've written a finite-element code, and I'm working on optimizing the python implementation. Profiling shows the slowest operation is the re-creation of an array that extracts edge degrees of freedom from the volume of the element (similar to b above). So, I'm trying to avoid copying the data every time, and just setting up 'b' once. The ctypes solution is sub-optimal since my code is mostly vectorized, that is, later I'd like to something like Code: -- c[ids] = b[ids] + d[ids] -- where c, and d are the same shape as b but contain different data. Any thoughts? If it's not possible that will save me time searching. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] R: fast numpy.fromfile skipping data chunks
This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). I'll at pytables. # Exploit the operating system's virtual memory manager to get a virtual copy of the entire file in memory # (This does not actually use any memory until accessed): virtual_arr = np.memmap(path, np.uint32, r) # Get a numpy view onto every 20th entry: virtual_arr_subsampled = virtual_arr[::20] # Copy those bits into regular malloc'ed memory: arr_subsampled = virtual_arr_subsampled.copy() ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Dot/inner products with broadcasting?
Hi! How can I compute dot product (or similar multiplysum operations) efficiently so that broadcasting is utilized? For multi-dimensional arrays, NumPy's inner and dot functions do not match the leading axes and use broadcasting, but instead the result has first the leading axes of the first input array and then the leading axes of the second input array. For instance, I would like to compute the following inner-product: np.sum(A*B, axis=-1) But numpy.inner gives: A = np.random.randn(2,3,4) B = np.random.randn(3,4) np.inner(A,B).shape # - (2, 3, 3) instead of (2, 3) Similarly for dot product, I would like to compute for instance: np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2) But numpy.dot gives: In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5) In [13]: np.dot(A,B).shape # - (2, 3, 2, 5) instead of (2, 3, 5) I could use einsum for these operations, but I'm not sure whether that's as efficient as using some BLAS-supported(?) dot products. I couldn't find any function which could perform this kind of operations. NumPy's functions seem to either flatten the input arrays (vdot, outer) or just use the axes of the input arrays separately (dot, inner, tensordot). Any help? Best regards, Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] R: fast numpy.fromfile skipping data chunks
I see that pytables deals with hdf5 data. It would be very nice if the data were in such a standard format, but that is not the case, and that can't be changed. Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per conto di Frédéric Bastien [no...@nouiz.org] Inviato: mercoledì 13 marzo 2013 15.03 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks Hi, I would suggest that you look at pytables[1]. It use a different file format, but it seam to do exactly what you want and give an object that have a very similar interface to numpy.ndarray (but fewer function). You would just ask for the slice/indices that you want and it return you a numpy.ndarray. HTH Frédéric [1] http://www.pytables.org/moin On Wed, Mar 13, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? If your data is stored as fixed-format binary (as it seems it is), then the easiest way is probably # Exploit the operating system's virtual memory manager to get a virtual copy of the entire file in memory # (This does not actually use any memory until accessed): virtual_arr = np.memmap(path, np.uint32, r) # Get a numpy view onto every 20th entry: virtual_arr_subsampled = virtual_arr[::20] # Copy those bits into regular malloc'ed memory: arr_subsampled = virtual_arr_subsampled.copy() (Your data is probably large enough that this will only work if you're using a 64-bit system, because of address space limitations; but if you have data that's too large to fit into memory, then I assume you're using a 64-bit system anyway...) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks
On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). np.memmap takes an offset= argument. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks
Indeed, but that offset it should be a multiple of the byte-size of dtype as the help says. Indeed, this is silly. Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per conto di Nathaniel Smith [n...@pobox.com] Inviato: mercoledì 13 marzo 2013 15.32 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). np.memmap takes an offset= argument. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks
On top of that, there is another issue: it can be that the data available itself is not a multiple of dtype, since there can be write errors at the end of the file, and I don't know how to deal with that. Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per conto di Andrea Cimatoribus Inviato: mercoledì 13 marzo 2013 15.37 A: Discussion of Numerical Python Oggetto: [Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks Indeed, but that offset it should be a multiple of the byte-size of dtype as the help says. Indeed, this is silly. Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per conto di Nathaniel Smith [n...@pobox.com] Inviato: mercoledì 13 marzo 2013 15.32 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). np.memmap takes an offset= argument. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks
Indeed, but that offset it should be a multiple of the byte-size of dtype as the help says. My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains. ac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks
On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: Indeed, but that offset it should be a multiple of the byte-size of dtype as the help says. My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains. Worst case you can always work around such issues with an extra layer of view manipulation: # create a raw view onto the contents of the file file_bytes = np.memmap(path, dtype=np.uint8, ...) # cut out any arbitrary number of bytes from the beginning and end data_bytes = file_bytes[...some slice expression...] # switch to viewing the bytes as the proper data type data = data_bytes.view(dtype=np.uint32) # proceed as before -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks
On 3/13/13 2:45 PM, Andrea Cimatoribus wrote: Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). [clip] You can do a fid.seek(offset) prior to np.fromfile() and the it will read from offset. See the docstrings for `file.seek()` on how to use it. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks
On 3/13/13 3:53 PM, Francesc Alted wrote: On 3/13/13 2:45 PM, Andrea Cimatoribus wrote: Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). [clip] You can do a fid.seek(offset) prior to np.fromfile() and the it will read from offset. See the docstrings for `file.seek()` on how to use it. Ups, you were already using file.seek(). Disregard, please. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks
Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story). The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file? Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per conto di Nathaniel Smith [n...@pobox.com] Inviato: mercoledì 13 marzo 2013 15.53 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: Indeed, but that offset it should be a multiple of the byte-size of dtype as the help says. My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains. Worst case you can always work around such issues with an extra layer of view manipulation: # create a raw view onto the contents of the file file_bytes = np.memmap(path, dtype=np.uint8, ...) # cut out any arbitrary number of bytes from the beginning and end data_bytes = file_bytes[...some slice expression...] # switch to viewing the bytes as the proper data type data = data_bytes.view(dtype=np.uint32) # proceed as before -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks
On 13 Mar 2013 15:16, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story). The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file? Oh, in that case you're probably IO bound, not CPU bound, so Cython etc. can't help. Traditional spinning-disk hard drives can read quite quickly, but take a long time to find the right place to read from and start reading. Your OS has heuristics in it to detect sequential reads and automatically start the setup for the next read while you're processing the previous read, so you don't see the seek overhead. If your reads are widely separated enough, these heuristics will get confused and you'll drop back to doing a new disk seek on every call to read(), which is deadly. (And would explain what you're seeing.) If this is what's going on, your best bet is to just write a python loop that uses fromfile() to read some largeish (megabytes?) chunk, subsample those and throw away the rest, and repeat. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Performance of einsum?
Hi, I was wondering if someone could provide some intuition on the performance of einsum? I have found that sometimes it is extremely efficient but sometimes it is several orders of magnitudes slower compared to some other approaches, for instance, using multiple dot-calls. My intuition is that the computation time of einsum is linear with respect to the size of the index space, that is, the product of the maximums of the indices. So, for instance computing the dot product of three matrices A*B*C would not be efficient as einsum('ij,jk,kl-il', A, B, C) because there are four indices i=1,...,I, j=1,...,J, k=1,...,K and l=1,...,L so the total computation time is O(I*J*K*L) which is much worse than with two dot products O(I*J*K+J*K*L), or with two einsum-calls for Y=A*B and Y*C. On the other hand, computing einsum('ij,ij,ij-i', A, B, C) would be efficient because the computation time is only O(I*J). Is this intuition roughly correct or how could I intuitively understand when the usage of einsum is bad? Best regards, Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] R: fast numpy.fromfile skipping data chunks
Thanks a lot for the feedback, I'll try to modify my function to overcome this issue. Since I'm in the process of buying new hardware too, a slight OT (but definitely related). Does an ssd provide substantial improvement in these cases? Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per conto di Nathaniel Smith [n...@pobox.com] Inviato: mercoledì 13 marzo 2013 16.43 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks On 13 Mar 2013 15:16, Andrea Cimatoribus andrea.cimatori...@nioz.nlmailto:andrea.cimatori...@nioz.nl wrote: Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story). The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file? Oh, in that case you're probably IO bound, not CPU bound, so Cython etc. can't help. Traditional spinning-disk hard drives can read quite quickly, but take a long time to find the right place to read from and start reading. Your OS has heuristics in it to detect sequential reads and automatically start the setup for the next read while you're processing the previous read, so you don't see the seek overhead. If your reads are widely separated enough, these heuristics will get confused and you'll drop back to doing a new disk seek on every call to read(), which is deadly. (And would explain what you're seeing.) If this is what's going on, your best bet is to just write a python loop that uses fromfile() to read some largeish (megabytes?) chunk, subsample those and throw away the rest, and repeat. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (@Pat Marion) Re: Yes, this one again ImportError: No module named multiarray
Many thanks Pat - the numpy discussion list is brill. Go ahead and see if the CPython developers would be interested as it is a problem that appears all the time on boards/lists. Best ... Dinesh From: Pat Marion Sent: Tuesday, March 12, 2013 7:23 AM To: Aron Ahmadia Cc: Discussion of Numerical Python Subject: Re: [Numpy-discussion] (@Pat Marion) Re: Yes,this one again ImportError: No module named multiarray Thanks for copying me, Aron. Hi Dinesh, I have a github project which demonstrates how to use numpy with freeze. The project's readme includes more information: https://github.com/patmarion/NumpyBuiltinExample It does require a small patch to CPython's import.c file. I haven't tried posted this patch to the CPython developers, perhaps there'd be interest incorporating it upstream. Pat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Yes, this one again ImportError: No module named multiarray
Hi Chris Darn! It worked this morning and I don't know why. Focused on PyInstaller because it creates a single executable. Testing on all major versions of Windows (32-bit and 64-bit), Linux and OSX. The problem OS is unsurprisingly, Windows XP (SP3). Numpy was upgraded to the mkl-version and maybe that did the trick. Tried to replicate on an identical Windows XP machine using the standard sourceforge distribution but that resulted in a pyinstaller error. Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy 1.7.0, Scipy 0.11.0, PyInstaller 2.0. Will post back if run into problems again. Best ... -- From: Chris Barker - NOAA Federal chris.bar...@noaa.gov Sent: Tuesday, March 12, 2013 2:50 PM To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] Yes,this one again ImportError: No module named multiarray On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia dineshbvad...@hotmail.com wrote: Does that mean numpy won't work with freeze/create_executable type of tools or is there a workaround? I've used numpy with py2exe and py2app out of the box with no issues ( actually, there is an issue with too much stuff getting bundled up, but it works) ImportError let alone what the solution is. The Traceback, similar to others found on the web, is: Traceback (most recent call last): File test.py, ... File C:\Python27\lib\site-packages\numpy\__init__.py, line 154, in module This indicates that your code is importing the numpy that's inside the system installation -- it should be using one in your app bundle. What bundling tool are you using? How did you install python/numpy? What does your bundling tol config look like? And, of course, version numbers of everything. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks
Since the files are huge, and would make me run out of memory, I need to read data skipping some records Is it possible to describe what you're doing with the data once you have subsampled it? And if there were a way to work with the full resolution data, would that be desirable? I ask because I've been dabbling with a pure-Python library for handilng larger-than-memory datasets - https://github.com/SciTools/biggus, and it uses similar chunking techniques as mentioned in the other replies to process data at the full streaming I/O rate. It's still in the early stages of development so the design can be fluid, so maybe it's worth seeing if there's enough in common with your needs to warrant adding your use case. Richard On 13 March 2013 13:45, Andrea Cimatoribus andrea.cimatori...@nioz.nlwrote: Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? Thanks import numpy as np #For cython! cimport numpy as np from libc.stdint cimport uint32_t def cffskip32(fid, int count=1, int skip=0): cdef int k=0 cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count, dtype=np.uint32) if skip=0: while kcount: try: data[k] = np.fromfile(fid, count=1, dtype=np.uint32) fid.seek(skip, 1) k +=1 except ValueError: data = data[:k] break return data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks
On 13 March 2013 16:54, Andrea Cimatoribus andrea.cimatori...@nioz.nlwrote: Since I'm in the process of buying new hardware too, a slight OT (but definitely related). Does an ssd provide substantial improvement in these cases? It should help. Nevertheless, when talking about performance, it is difficult to predict, mainly because in a computer there are many things going on and many layers involved. I have a couple of computers equipped with SSD, if you want, if you send me some benchmarks I can run them and see if I get any speedup. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] can't run cython on mtrand.pyx
Grabbed numpy-1.7.0 source. Cython is 0.18 cython mtrand.pyx produces lots of errors. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] can't run cython on mtrand.pyx
On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker ndbeck...@gmail.com wrote: Grabbed numpy-1.7.0 source. Cython is 0.18 cython mtrand.pyx produces lots of errors. It helps to copy-and-paste the errors that you are seeing. In any case, Cython 0.18 works okay on master's mtrand.pyx sources. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] can't run cython on mtrand.pyx
Robert Kern wrote: On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker ndbeck...@gmail.com wrote: Grabbed numpy-1.7.0 source. Cython is 0.18 cython mtrand.pyx produces lots of errors. It helps to copy-and-paste the errors that you are seeing. In any case, Cython 0.18 works okay on master's mtrand.pyx sources. Well, this is the first error: cython mtrand.pyx Error compiling Cython file: ... PyArray_DIMS(oa) , NPY_DOUBLE) length = PyArray_SIZE(array) array_data = double *PyArray_DATA(array) itera = flatiterPyArray_IterNew(objectoa) for i from 0 = i length: array_data[i] = func(state, (double *(itera.dataptr))[0]) ^ mtrand.pyx:177:41: Python objects cannot be cast to pointers of primitive types ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks
On Wed, Mar 13, 2013 at 9:54 AM, Andrea Cimatoribus andrea.cimatori...@nioz.nl wrote: Thanks a lot for the feedback, I'll try to modify my function to overcome this issue. Since I'm in the process of buying new hardware too, a slight OT (but definitely related). Does an ssd provide substantial improvement in these cases? It should. Seek time on an ssd is quite low, and readout is fast. Skipping over items will probably not be as fast as a sequential read but I expect it will be substantially faster than a disk. Nathaniel's loop idea will probably work faster also. The sequential readout rate of a modern ssd will be about 500 MB/sec, so you can probably just divide that into your file size to get an estimate of the time needed. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy reference array
On Wed, Mar 13, 2013 at 6:56 AM, Matt U mpuec...@mit.edu wrote: Is it possible to create a numpy array which points to the same data in a different numpy array (but in different order etc)? You can do this (easily), but only if the different order can be defined in terms of strides. A simple example is a transpose: In [3]: a = np.arange(12).reshape((3,4)) In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [5]: b = a.T In [6]: b Out[6]: array([[ 0, 4, 8], [ 1, 5, 9], [ 2, 6, 10], [ 3, 7, 11]]) # b is the transpose of a # but a view on the same data block: # change a: In [7]: a[2,1] = 44 In [8]: a Out[8]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 44, 10, 11]]) # b is changed, too. In [9]: b Out[9]: array([[ 0, 4, 8], [ 1, 5, 44], [ 2, 6, 10], [ 3, 7, 11]]) check out stride tricks for clever things you can do. But numpy does require that the data in your array be a contiguous block, in order, so you can't arbitrarily re-arrange it while keeping a view. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Any help from Numpy community?
* Any help from Numpy community [[ 0. 1.540. 0. 0.1.08 1.08 1.08 ] [ 1.540. 1.081.08 1.080. 0. 0. ] [0. 1.08 0. 0. 0.0. 0. 0. ] [0. 1.08 0. 0. 0.0. 0.0.] [ 0.1.080. 0. 0.0. 0.0.] [ 1.08 0. 0. 0. 0.0. 0.0. ] [ 1.08 0. 0. 0. 0.0. 0.0. ] [ 1.08 0. 0. 0. 0.0. 0.0. ]] the above is the numpy array matrix. the numbers represents: C-C: 1.54 and C-H=1.08 So I want to write this form as C of index i is connected to C of index j C of index i is connected to H of index j (C(i),C(j)) # key C(i) and value C(j) (C(i),H(j)) # key C(i) and value H(j) ; the key C(i) can be repeated to fulfil as much as the values of H(j) To summarize, the out put may look like: C1 is connected to C2 C1 is connected to H1 C1 is connected to H3 C2 is connected to H2 etc Any guide is greatly appreciated, thanks birda * ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Yes, this one again ImportError: No module named multiarray
Glad you got it working! For those who might be interested, the distinction between the example I linked to and packaging tools like PyInstaller or py2exe, is that NumpyBuiltinExamplehttps://github.com/patmarion/NumpyBuiltinExampleuses static linking to embed numpy as a builtin module. At runtime, there is no dynamic loading, and there is no filesystem access. The technique is targeted at HPC or embedded systems where you might want to avoid touching the filesystem, or avoid dynamic loading. Pat On Thu, Mar 14, 2013 at 2:08 AM, Dinesh B Vadhia dineshbvad...@hotmail.comwrote: Hi Chris Darn! It worked this morning and I don't know why. Focused on PyInstaller because it creates a single executable. Testing on all major versions of Windows (32-bit and 64-bit), Linux and OSX. The problem OS is unsurprisingly, Windows XP (SP3). Numpy was upgraded to the mkl-version and maybe that did the trick. Tried to replicate on an identical Windows XP machine using the standard sourceforge distribution but that resulted in a pyinstaller error. Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy 1.7.0, Scipy 0.11.0, PyInstaller 2.0. Will post back if run into problems again. Best ... -- From: Chris Barker - NOAA Federal chris.bar...@noaa.gov Sent: Tuesday, March 12, 2013 2:50 PM To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] Yes,this one again ImportError: No module named multiarray On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia dineshbvad...@hotmail.com wrote: Does that mean numpy won't work with freeze/create_executable type of tools or is there a workaround? I've used numpy with py2exe and py2app out of the box with no issues ( actually, there is an issue with too much stuff getting bundled up, but it works) ImportError let alone what the solution is. The Traceback, similar to others found on the web, is: Traceback (most recent call last): File test.py, ... File C:\Python27\lib\site-packages\numpy\__init__.py, line 154, in module This indicates that your code is importing the numpy that's inside the system installation -- it should be using one in your app bundle. What bundling tool are you using? How did you install python/numpy? What does your bundling tol config look like? And, of course, version numbers of everything. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion