Re: [Numpy-discussion] multiprocessing shared arrays and numpy
A Sunday 07 March 2010 20:03:21 Gael Varoquaux escrigué: On Sun, Mar 07, 2010 at 07:00:03PM +, René Dudfield wrote: 1. Mmap'd files are useful since you can reuse disk cache as program memory. So large files don't waste ram on the disk cache. I second that. mmaping has worked very well for me for large datasets, especialy in the context of reducing memory pressure. As far as I know, memmap files (or better, the underlying OS) *use* all available RAM for loading data until RAM is exhausted and then start to use SWAP, so the memory pressure is still there. But I may be wrong... -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing shared arrays and numpy
On Thu, Mar 11, 2010 at 10:04:36AM +0100, Francesc Alted wrote: As far as I know, memmap files (or better, the underlying OS) *use* all available RAM for loading data until RAM is exhausted and then start to use SWAP, so the memory pressure is still there. But I may be wrong... I believe that your above assertion is 'half' right. First I think that it is not SWAP that the memapped file uses, but the original disk space, thus you avoid running out of SWAP. Second, if you open several times the same data without memmapping, I believe that it will be duplicated in memory. On the other hand, when you memapping, it is not duplicated, thus if you are running several processing jobs on the same data, you save memory. I am very much in this case. Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Calling routines from a Fortran library using python
On Mon, 22 Feb 2010 22:18:23 +0900 David Cournapeau courn...@gmail.com wrote: On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner nwag...@iam.uni-stuttgart.de wrote: ar x test.a gfortran -shared *.o -o libtest.so -lg2c to build a shared library. The additional option -lg2c was necessary due to an undefined symbol: s_cmp You should avoid the -lg2c option at any cost if compiling with gfortran. I am afraid that you got a library compiled with g77. If that's the case, you should use g77 and not gfortran. You cannot mix libraries built with one with libraries with another. Now I am able to load the shared library from ctypes import * my_lib = CDLL('test.so') What are the next steps to use the library functions within python ? You use it as you would use a C library: http://python.net/crew/theller/ctypes/tutorial.html But the fortran ABI, at least for code built with g77 and gfortran, pass everything by reference. To make sure to pass the right arguments, I strongly suggest to double check with the .h you received. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi all, I tried to run the following script. The result is a segmentation fault. Did I use byref correctly ? from ctypes import * my_dsio = CDLL('libdsio20_gnu4.so') # loading dynamic link libraries # # FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR) # # int I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1; # WordBUF[100],HEAD[30]; # char*PATH,*STRING; # char*PGNAME,*DATE,*TIME,*TEXT; # int LHEAD=30; # # C : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT)); # IERR= c_int() FDSCAT = c_char_p('dscat.ds') JUNCAT = c_int() LDSNCAT = c_int(len(FDSCAT.value)) print print 'LDSNCAT', LDSNCAT.value print 'FDSCAT' , FDSCAT.value , len(FDSCAT.value) my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) # segmentation fault print IERR.value Any idea ? Nils ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Calling routines from a Fortran library using python
Nils Wagner wrote: On Mon, 22 Feb 2010 22:18:23 +0900 David Cournapeau courn...@gmail.com wrote: On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner nwag...@iam.uni-stuttgart.de wrote: ar x test.a gfortran -shared *.o -o libtest.so -lg2c to build a shared library. The additional option -lg2c was necessary due to an undefined symbol: s_cmp You should avoid the -lg2c option at any cost if compiling with gfortran. I am afraid that you got a library compiled with g77. If that's the case, you should use g77 and not gfortran. You cannot mix libraries built with one with libraries with another. Now I am able to load the shared library from ctypes import * my_lib = CDLL('test.so') What are the next steps to use the library functions within python ? You use it as you would use a C library: http://python.net/crew/theller/ctypes/tutorial.html But the fortran ABI, at least for code built with g77 and gfortran, pass everything by reference. To make sure to pass the right arguments, I strongly suggest to double check with the .h you received. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi all, I tried to run the following script. The result is a segmentation fault. Did I use byref correctly ? from ctypes import * my_dsio = CDLL('libdsio20_gnu4.so') # loading dynamic link libraries # # FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR) # # int I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1; # WordBUF[100],HEAD[30]; # char*PATH,*STRING; # char*PGNAME,*DATE,*TIME,*TEXT; # int LHEAD=30; # # C : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT)); # IERR= c_int() FDSCAT = c_char_p('dscat.ds') JUNCAT = c_int() LDSNCAT = c_int(len(FDSCAT.value)) print print 'LDSNCAT', LDSNCAT.value print 'FDSCAT' , FDSCAT.value , len(FDSCAT.value) my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) # segmentation fault print IERR.value Any idea ? You shouldn't have byref on FDSCAT nor LDSNCAT, as explained by this line: # C : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT)); Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing shared arrays and numpy
Here is a strange thing I am getting with multiprocessing and memory mapped array: The below script generates the error message 30 times (for every slice access): Exception AttributeError: AttributeError('NoneType' object has no attribute 'tell',) in bound method memmap.__del__ of memmap(2949995000.0) ignored Although I get the correct answer eventually. -- import numpy as N import multiprocessing as MP def average(cube): return [plane.mean() for plane in cube] N.arange(30*100*100, dtype=N.int32).tofile(open('30x100x100_int32.dat','w')) data = N.memmap('30x100x100_int32.dat', dtype=N.int32, shape=(30,100,100)) pool = MP.Pool(processes=1) job = pool.apply_async(average, [data,]) print job.get() -- I use python 2.6.4 and numpy 1.4.0 on 64 bit linux (amd64) Nadav -Original Message- From: numpy-discussion-boun...@scipy.org on behalf of Gael Varoquaux Sent: Thu 11-Mar-10 11:36 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] multiprocessing shared arrays and numpy On Thu, Mar 11, 2010 at 10:04:36AM +0100, Francesc Alted wrote: As far as I know, memmap files (or better, the underlying OS) *use* all available RAM for loading data until RAM is exhausted and then start to use SWAP, so the memory pressure is still there. But I may be wrong... I believe that your above assertion is 'half' right. First I think that it is not SWAP that the memapped file uses, but the original disk space, thus you avoid running out of SWAP. Second, if you open several times the same data without memmapping, I believe that it will be duplicated in memory. On the other hand, when you memapping, it is not duplicated, thus if you are running several processing jobs on the same data, you save memory. I am very much in this case. Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion winmail.dat___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Calling routines from a Fortran library using python
Nils Wagner wrote: On Thu, 11 Mar 2010 13:01:33 +0100 Dag Sverre Seljebotn da...@student.matnat.uio.no wrote: Nils Wagner wrote: On Mon, 22 Feb 2010 22:18:23 +0900 David Cournapeau courn...@gmail.com wrote: On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner nwag...@iam.uni-stuttgart.de wrote: ar x test.a gfortran -shared *.o -o libtest.so -lg2c to build a shared library. The additional option -lg2c was necessary due to an undefined symbol: s_cmp You should avoid the -lg2c option at any cost if compiling with gfortran. I am afraid that you got a library compiled with g77. If that's the case, you should use g77 and not gfortran. You cannot mix libraries built with one with libraries with another. Now I am able to load the shared library from ctypes import * my_lib = CDLL('test.so') What are the next steps to use the library functions within python ? You use it as you would use a C library: http://python.net/crew/theller/ctypes/tutorial.html But the fortran ABI, at least for code built with g77 and gfortran, pass everything by reference. To make sure to pass the right arguments, I strongly suggest to double check with the .h you received. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi all, I tried to run the following script. The result is a segmentation fault. Did I use byref correctly ? from ctypes import * my_dsio = CDLL('libdsio20_gnu4.so') # loading dynamic link libraries # # FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR) # # int I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1; # WordBUF[100],HEAD[30]; # char*PATH,*STRING; # char*PGNAME,*DATE,*TIME,*TEXT; # int LHEAD=30; # # C : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT)); # IERR= c_int() FDSCAT = c_char_p('dscat.ds') JUNCAT = c_int() LDSNCAT = c_int(len(FDSCAT.value)) print print 'LDSNCAT', LDSNCAT.value print 'FDSCAT' , FDSCAT.value , len(FDSCAT.value) my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) # segmentation fault print IERR.value Any idea ? You shouldn't have byref on FDSCAT nor LDSNCAT, as explained by this line: # C : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT)); Dag Sverre Sorry, I am newbie to C. What is the correct way ? my_dsio.dsio(byref(JUNCAT),FDSCAT,byref(IERR),LDSNCAT) Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Calling routines from a Fortran library using python
On Thu, 11 Mar 2010 13:42:43 +0100 Dag Sverre Seljebotn da...@student.matnat.uio.no wrote: Nils Wagner wrote: On Thu, 11 Mar 2010 13:01:33 +0100 Dag Sverre Seljebotn da...@student.matnat.uio.no wrote: Nils Wagner wrote: On Mon, 22 Feb 2010 22:18:23 +0900 David Cournapeau courn...@gmail.com wrote: On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner nwag...@iam.uni-stuttgart.de wrote: ar x test.a gfortran -shared *.o -o libtest.so -lg2c to build a shared library. The additional option -lg2c was necessary due to an undefined symbol: s_cmp You should avoid the -lg2c option at any cost if compiling with gfortran. I am afraid that you got a library compiled with g77. If that's the case, you should use g77 and not gfortran. You cannot mix libraries built with one with libraries with another. Now I am able to load the shared library from ctypes import * my_lib = CDLL('test.so') What are the next steps to use the library functions within python ? You use it as you would use a C library: http://python.net/crew/theller/ctypes/tutorial.html But the fortran ABI, at least for code built with g77 and gfortran, pass everything by reference. To make sure to pass the right arguments, I strongly suggest to double check with the .h you received. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi all, I tried to run the following script. The result is a segmentation fault. Did I use byref correctly ? from ctypes import * my_dsio = CDLL('libdsio20_gnu4.so') # loading dynamic link libraries # # FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR) # # int I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1; # WordBUF[100],HEAD[30]; # char*PATH,*STRING; # char*PGNAME,*DATE,*TIME,*TEXT; # int LHEAD=30; # # C : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT)); # IERR= c_int() FDSCAT = c_char_p('dscat.ds') JUNCAT = c_int() LDSNCAT = c_int(len(FDSCAT.value)) print print 'LDSNCAT', LDSNCAT.value print 'FDSCAT' , FDSCAT.value , len(FDSCAT.value) my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) # segmentation fault print IERR.value Any idea ? You shouldn't have byref on FDSCAT nor LDSNCAT, as explained by this line: # C : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT)); Dag Sverre Sorry, I am newbie to C. What is the correct way ? my_dsio.dsio(byref(JUNCAT),FDSCAT,byref(IERR),LDSNCAT) Dag Great. It works like a charme. How can I translate the following C-code into Python ? I don't know how to handle HEAD and memcpy ? Any pointer would be appreciated. Thanks in advance. typedef union { int i; float f; charc[4]; } Word; int I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1; Word BUF[100],HEAD[30]; for (I=5;ILHEAD;I++) HEAD[I].i = 0; HEAD[ 0].i = 1; HEAD[ 1].i = LHEAD + NDATA*7; HEAD[ 2].i = LHEAD; HEAD[ 3].i = NDATA; HEAD[ 4].i = 7; memcpy (HEAD[ 7].c,DSIO,4); memcpy (HEAD[ 8].c,TEST,4); memcpy (HEAD[ 9].c,NPCO,4); memcpy (HEAD[10].c,,4); memcpy (HEAD[11].c,DSIO,4); HEAD[20].i = 1; HEAD[21].i = NDATA; STRING = MM RAD; DSEINH(STRING,HEAD[24].i,ONE, strlen(STRING)); ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing shared arrays and numpy
A Thursday 11 March 2010 10:36:42 Gael Varoquaux escrigué: On Thu, Mar 11, 2010 at 10:04:36AM +0100, Francesc Alted wrote: As far as I know, memmap files (or better, the underlying OS) *use* all available RAM for loading data until RAM is exhausted and then start to use SWAP, so the memory pressure is still there. But I may be wrong... I believe that your above assertion is 'half' right. First I think that it is not SWAP that the memapped file uses, but the original disk space, thus you avoid running out of SWAP. Second, if you open several times the same data without memmapping, I believe that it will be duplicated in memory. On the other hand, when you memapping, it is not duplicated, thus if you are running several processing jobs on the same data, you save memory. I am very much in this case. Mmh, this is not my experience. During the past month, I was proposing in a course the students to compare the memory consumption of numpy.memmap and tables.Expr (a module for performing out-of-memory computations in PyTables). The idea was precisely to show that, contrarily to tables.Expr, numpy.memmap computations do take a lot of memory when they are being accessed. I'm attaching a slightly modified version of that exercise. On it, one have to compute a polynomial in a certain range. Here it is the output of the script for the numpy.memmap case for a machine with 8 GB RAM and 6 GB of swap: Total size for datasets: 7629.4 MB Populating x using numpy.memmap with 5 points... Total file sizes: 40 -- (3814.7 MB) *** Time elapsed populating: 70.982 Computing: '((.25*x + .75)*x - 1.5)*x - 2' using numpy.memmap Total file sizes: 80 -- (7629.4 MB) Time elapsed computing: 81.727 10.08user 13.37system 2:33.26elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k 7808inputs+15625008outputs (39major+5750196minor)pagefaults 0swaps While the computation was going on, I've spied the process with the top utility, and that told me that the total virtual size consumed by the Python process was 7.9 GB, with a total of *resident* memory of 6.7 GB (!). And this should not only be a top malfunction because I've checked that, by the end of the computation, my machine started to swap some processes out (i.e. the working set above was too large to allow the OS keep everything in memory). Now, just for the sake of comparison, I've tried running the same script but using tables.Expr. Here it is the output: Total size for datasets: 7629.4 MB Populating x using tables.Expr with 5 points... Total file sizes: 4000631280 -- (3815.3 MB) *** Time elapsed populating: 78.817 Computing: '((.25*x + .75)*x - 1.5)*x - 2' using tables.Expr Total file sizes: 8001261168 -- (7630.6 MB) Time elapsed computing: 155.836 13.11user 18.59system 3:58.61elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k 7842784inputs+15632208outputs (28major+940347minor)pagefaults 0swaps and top was telling me that memory consumption was 148 MB for total virtual size and just 44 MB (as expected, because computation was really made using an out-of-core algorithm). Interestingly, when using compression (Blosc level 4, in this case), the time to do the computation with tables.Expr has reduced a lot: Total size for datasets: 7629.4 MB Populating x using tables.Expr with 5 points... Total file sizes: 1080130765 -- (1030.1 MB) *** Time elapsed populating: 30.005 Computing: '((.25*x + .75)*x - 1.5)*x - 2' using tables.Expr Total file sizes: 2415761895 -- (2303.9 MB) Time elapsed computing: 40.048 37.11user 6.98system 1:12.88elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 45312inputs+4720568outputs (4major+989323minor)pagefaults 0swaps while memory consumption is barely the same than above: 148 MB / 45 MB. So, in my experience, numpy.memmap is really using that large chunk of memory (unless my testbed is badly programmed, in which case I'd be grateful if you can point out what's wrong). -- Francesc Alted ### # This script compares the speed of the computation of a polynomial # for different (numpy.memmap and tables.Expr) out-of-memory paradigms. # # Author: Francesc Alted # Date: 2010-02-03 ### import os import sys from time import time import numpy as np import tables as tb expr = ((.25*x + .75)*x - 1.5)*x - 2 # a computer-friendly polynomial N = 500*1000*1000 # the number of points to compute expression step = 100*1000 # perform calculation
Re: [Numpy-discussion] multiprocessing shared arrays and numpy
On Thu, Mar 11, 2010 at 02:26:49PM +0100, Francesc Alted wrote: I believe that your above assertion is 'half' right. First I think that it is not SWAP that the memapped file uses, but the original disk space, thus you avoid running out of SWAP. Second, if you open several times the same data without memmapping, I believe that it will be duplicated in memory. On the other hand, when you memapping, it is not duplicated, thus if you are running several processing jobs on the same data, you save memory. I am very much in this case. Mmh, this is not my experience. During the past month, I was proposing in a course the students to compare the memory consumption of numpy.memmap and tables.Expr (a module for performing out-of-memory computations in PyTables). [snip] So, in my experience, numpy.memmap is really using that large chunk of memory (unless my testbed is badly programmed, in which case I'd be grateful if you can point out what's wrong). OK, so what you are saying is that my assertion #1 was wrong. Fair enough, as I was writing it I was thinking that I had no hard fact to back it. How about assertion #2? I can think only of this 'story' to explain why I can run parallel computation when I use memmap that blow up if I don't use memmap. Also, could it be that the memmap mode changes things? I use only the 'r' mode, which is read-only. This is all very interesting, and you have much more insights on these problems than me. Would you be interested in coming to Euroscipy in Paris to give a 1 or 2 hours long tutorial on memory and IO problems and how you address them with Pytables? It would be absolutely thrilling. I must warn that I am afraid that we won't be able to pay for your trip, though, as I want to keep the price of the conference low. Best, Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multiprocessing shared arrays and numpy
A Thursday 11 March 2010 14:35:49 Gael Varoquaux escrigué: So, in my experience, numpy.memmap is really using that large chunk of memory (unless my testbed is badly programmed, in which case I'd be grateful if you can point out what's wrong). OK, so what you are saying is that my assertion #1 was wrong. Fair enough, as I was writing it I was thinking that I had no hard fact to back it. How about assertion #2? I can think only of this 'story' to explain why I can run parallel computation when I use memmap that blow up if I don't use memmap. Well, I must tell that I've not experience about running memmapped arrays in parallel computations, but it sounds like they can actually behave as shared- memory arrays, so yes, you may definitely be right for #2, i.e. memmapped data is not duplicated when accessed in parallel by different processes (in read- only mode, of course), which is certainly a very interesting technique to share data in parallel processes. Thanks for pointing out this! Also, could it be that the memmap mode changes things? I use only the 'r' mode, which is read-only. I don't think so. When doing the computation, I open the x values in read- only mode, and memory consumption is still there. This is all very interesting, and you have much more insights on these problems than me. Would you be interested in coming to Euroscipy in Paris to give a 1 or 2 hours long tutorial on memory and IO problems and how you address them with Pytables? It would be absolutely thrilling. I must warn that I am afraid that we won't be able to pay for your trip, though, as I want to keep the price of the conference low. Yes, no problem. I was already thinking about presenting something at EuroSciPy. A tutorial about PyTables/memory IO would be really great for me. We can nail the details off-list. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] subclassing ndarray in python3
Now that the trunk has some support for python3, I am working on making Quantities work with python3 as well. I'm running into some problems related to subclassing ndarray that can be illustrated with a simple script, reproduced below. It looks like there is a problem with the reflected operations, I see problems with __rmul__ and __radd__, but not with __mul__ and __add__: import numpy as np class A(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs) class B(A): def __mul__(self, other): return self.view(A).__mul__(other) def __rmul__(self, other): return self.view(A).__rmul__(other) def __add__(self, other): return self.view(A).__add__(other) def __radd__(self, other): return self.view(A).__radd__(other) a = A((10,)) b = B((10,)) print('A __mul__:') print(a.__mul__(2)) # ok print(a.view(np.ndarray).__mul__(2)) # ok print(a*2) # ok print('A __rmul__:') print(a.__rmul__(2)) # yields NotImplemented print(a.view(np.ndarray).__rmul__(2)) # yields NotImplemented print(2*a) # ok !!?? print('B __mul__:') print(b.__mul__(2)) # ok print(b.view(A).__mul__(2)) # ok print(b.view(np.ndarray).__mul__(2)) # ok print(b*2) # ok print('B __add__:') print(b.__add__(2)) # ok print(b.view(A).__add__(2)) # ok print(b.view(np.ndarray).__add__(2)) # ok print(b+2) # ok print('B __rmul__:') print(b.__rmul__(2)) # yields NotImplemented print(b.view(A).__rmul__(2)) # yields NotImplemented print(b.view(np.ndarray).__rmul__(2)) # yields NotImplemented print(2*b) # yields: TypeError: unsupported operand type(s) for *: 'int' and 'B' print('B __radd__:') print(b.__radd__(2)) # yields NotImplemented print(b.view(A).__radd__(2)) # yields NotImplemented print(b.view(np.ndarray).__radd__(2)) # yields NotImplemented print(2+b) # yields: TypeError: unsupported operand type(s) for +: 'int' and 'B' ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] crash at prompt exit after running test
hi there, I am adding this to this thread and not to the trac, because I am not sure whether it adds noise or a piece of info. I just downloaded the scipy trunk and built it, and ran nosetests on it, which bombed instantly So I tried to get into subdirs to check test scripts separately. and here is one : [co...@jarrett tests]$ ~/.local/bin/ipython test_integrate.py --- AssertionErrorTraceback (most recent call last) /home/cohen/sources/python/scipy/scipy/integrate/tests/test_integrate.py in module() 208 209 if __name__ == __main__: -- 210 run_module_suite() 211 212 /home/cohen/.local/lib/python2.6/site-packages/numpy/testing/nosetester.pyc in run_module_suite(file_to_run) 75 f = sys._getframe(1) 76 file_to_run = f.f_locals.get('__file__', None) --- 77 assert file_to_run is not None 78 79 import_nose().run(argv=['',file_to_run]) AssertionError: python: Modules/gcmodule.c:277: visit_decref: Assertion `gc-gc.gc_refs != 0' failed. Aborted (core dumped) [co...@jarrett tests]$ pwd /home/cohen/sources/python/scipy/scipy/integrate/tests the bomb is the same, but the context seems different... I leave that to the experts :) Johann On 03/10/2010 06:06 PM, Charles R Harris wrote: On Wed, Mar 10, 2010 at 10:39 AM, Bruce Southey bsout...@gmail.com mailto:bsout...@gmail.com wrote: On 03/10/2010 08:59 AM, Pauli Virtanen wrote: Wed, 10 Mar 2010 15:40:04 +0100, Johann Cohen-Tanugi wrote: Pauli, isn't it hopeless to follow the execution of the source code when the crash actually occurs when I exit, and not when I execute. I would have to understand enough of this umath_tests.c.src to spot a refcount error or things like that Yeah, it's not easy, and requires knowing how to track this type of errors. I didn't actually mean that you should try do it, just posed it as a general challenge to all interested parties :) On a more serious note, maybe there's a compilation flag or something in Python that warns when refcounts go negative (or something). Cheers, Pauli ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi, I think I managed to find this. I reverted back my svn versions ($svn update -r 8262) and cleaned both the build and installation directories. It occurred with changeset 8262 (earlier changesets appear okay but later ones do not) http://projects.scipy.org/numpy/changeset/8262 Specifically in the file: numpy/core/code_generators/generate_ufunc_api.py There is an extra call to that should have been deleted on line 54(?). Py_DECREF(numpy); Attached a patch to ticket 1425 http://projects.scipy.org/numpy/ticket/1425 Look like my bad. I'm out of town at the moment so someone else needs to apply the patch. That whole bit of code could probably use a daylight audit. Chuck -- This message has been scanned for viruses and dangerous content by *MailScanner* http://www.mailscanner.info/, and is believed to be clean. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] subclassing ndarray in python3
Hi Darren, to, 2010-03-11 kello 11:11 -0500, Darren Dale kirjoitti: Now that the trunk has some support for python3, I am working on making Quantities work with python3 as well. I'm running into some problems related to subclassing ndarray that can be illustrated with a simple script, reproduced below. It looks like there is a problem with the reflected operations, I see problems with __rmul__ and __radd__, but not with __mul__ and __add__: Thanks for testing. I wish the test suite was more complete (hint! hint! :) Yes, Python 3 introduced some semantic changes in how subclasses of builtin classes (= written in C) inherit the __r*__ operations. Below I'll try to explain what is going on. We probably need to change some things to make things work better on Py3, within the bounds we are able to. Suggestions are welcome. The most obvious one could be to explicitly implement __rmul__ etc. on Python 3. [clip] class A(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs) class B(A): def __mul__(self, other): return self.view(A).__mul__(other) def __rmul__(self, other): return self.view(A).__rmul__(other) def __add__(self, other): return self.view(A).__add__(other) def __radd__(self, other): return self.view(A).__radd__(other) [clip] print('A __rmul__:') print(a.__rmul__(2)) # yields NotImplemented print(a.view(np.ndarray).__rmul__(2)) # yields NotImplemented Correct. ndarray does not implement __rmul__, but relies on an automatic wrapper generated by Python. The automatic wrapper (wrap_binaryfunc_r) does the following: 1. Is `type(other)` a subclass of `type(self)`? If yes, call __mul__ with swapped arguments. 2. If not, bail out with NotImplemented. So it bails out. Previously, the ndarray type had a flag that made Python to skip the subclass check. That does not exist any more on Python 3, and is the root of this issue. print(2*a) # ok !!?? Here, Python checks 1. Does nb_multiply from the left op succeed? Nope, since floats don't know how to multiply ndarrays. 2. Does nb_multiply from the right op succeed? Here the execution passes *directly* to array_multiply, completely skipping the __rmul__ wrapper. Note also that in the C-level number protocol there is only a single multiplication function for both left and right multiplication. [clip] print('B __rmul__:') print(b.__rmul__(2)) # yields NotImplemented print(b.view(A).__rmul__(2)) # yields NotImplemented print(b.view(np.ndarray).__rmul__(2)) # yields NotImplemented print(2*b) # yields: TypeError: unsupported operand type(s) for *: 'int' and 'B' But here, the subclass calls the wrapper ndarray.__rmul__, which wants to be careful with types, and hence fails. Yes, probably explicitly defining __rmul__ for ndarray could be the right solution. Please file a bug report on this. Cheers, Pauli ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] crash at prompt exit after running test
On 03/11/2010 02:01 PM, Johann Cohen-Tanugi wrote: hi there, I am adding this to this thread and not to the trac, because I am not sure whether it adds noise or a piece of info. I just downloaded the scipy trunk and built it, and ran nosetests on it, which bombed instantly So I tried to get into subdirs to check test scripts separately. and here is one : [co...@jarrett tests]$ ~/.local/bin/ipython test_integrate.py --- AssertionErrorTraceback (most recent call last) /home/cohen/sources/python/scipy/scipy/integrate/tests/test_integrate.py in module() 208 209 if __name__ == __main__: -- 210 run_module_suite() 211 212 /home/cohen/.local/lib/python2.6/site-packages/numpy/testing/nosetester.pyc in run_module_suite(file_to_run) 75 f = sys._getframe(1) 76 file_to_run = f.f_locals.get('__file__', None) --- 77 assert file_to_run is not None 78 79 import_nose().run(argv=['',file_to_run]) AssertionError: python: Modules/gcmodule.c:277: visit_decref: Assertion `gc-gc.gc_refs != 0' failed. Aborted (core dumped) [co...@jarrett tests]$ pwd /home/cohen/sources/python/scipy/scipy/integrate/tests the bomb is the same, but the context seems different... I leave that to the experts :) Johann Yes, I think it is the same issue as I do not have the problem after fixing the following file and rebuilding numpy and scipy: numpy/core/code_generators/generate_ufunc_api.py Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] crash at prompt exit after running test
is your fix committed? On 03/11/2010 09:47 PM, Bruce Southey wrote: On 03/11/2010 02:01 PM, Johann Cohen-Tanugi wrote: hi there, I am adding this to this thread and not to the trac, because I am not sure whether it adds noise or a piece of info. I just downloaded the scipy trunk and built it, and ran nosetests on it, which bombed instantly So I tried to get into subdirs to check test scripts separately. and here is one : [co...@jarrett tests]$ ~/.local/bin/ipython test_integrate.py --- AssertionErrorTraceback (most recent call last) /home/cohen/sources/python/scipy/scipy/integrate/tests/test_integrate.py in module() 208 209 if __name__ == __main__: -- 210 run_module_suite() 211 212 /home/cohen/.local/lib/python2.6/site-packages/numpy/testing/nosetester.pyc in run_module_suite(file_to_run) 75 f = sys._getframe(1) 76 file_to_run = f.f_locals.get('__file__', None) --- 77 assert file_to_run is not None 78 79 import_nose().run(argv=['',file_to_run]) AssertionError: python: Modules/gcmodule.c:277: visit_decref: Assertion `gc-gc.gc_refs != 0' failed. Aborted (core dumped) [co...@jarrett tests]$ pwd /home/cohen/sources/python/scipy/scipy/integrate/tests the bomb is the same, but the context seems different... I leave that to the experts :) Johann Yes, I think it is the same issue as I do not have the problem after fixing the following file and rebuilding numpy and scipy: numpy/core/code_generators/generate_ufunc_api.py Bruce -- This message has been scanned for viruses and dangerous content by *MailScanner* http://www.mailscanner.info/, and is believed to be clean. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] subclassing ndarray in python3
Hi Pauli, On Thu, Mar 11, 2010 at 3:38 PM, Pauli Virtanen p...@iki.fi wrote: Thanks for testing. I wish the test suite was more complete (hint! hint! :) I'll be happy to contribute, but lately I get a few 15-30 minute blocks a week for this kind of work (hence the short attempt to work on Quantities this morning), and its not likely to let up for about 3 weeks. Yes, probably explicitly defining __rmul__ for ndarray could be the right solution. Please file a bug report on this. Done: http://projects.scipy.org/numpy/ticket/1426 Cheers, and *thank you* for all you have already done to support python-3, Darren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] arange including stop value?
davefallest wrote: ... In [3]: np.arange(1.01, 1.1, 0.01) Out[3]: array([ 1.01, 1.02, 1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 ]) Why does the ... np.arange command end up including my stop value? From the help for arange: For floating point arguments, the length of the result is ``ceil((stop - start)/step)``. Because of floating point overflow, this rule may result in the last element of `out` being greater than `stop`. -- View this message in context: http://old.nabble.com/arange-including-stop-value--tp27866607p27872069.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] arange including stop value?
On 11 March 2010 19:30, Tom K. t...@kraussfamily.org wrote: davefallest wrote: ... In [3]: np.arange(1.01, 1.1, 0.01) Out[3]: array([ 1.01, 1.02, 1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1 ]) Why does the ... np.arange command end up including my stop value? Don't use arange for floating-point values. Use linspace instead. Anne From the help for arange: For floating point arguments, the length of the result is ``ceil((stop - start)/step)``. Because of floating point overflow, this rule may result in the last element of `out` being greater than `stop`. -- View this message in context: http://old.nabble.com/arange-including-stop-value--tp27866607p27872069.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] crash at prompt exit after running test
On Thu, Mar 11, 2010 at 3:57 PM, Johann Cohen-Tanugi co...@lpta.in2p3.frwrote: is your fix committed? No. Pauli thinks the problem may lie elsewhere. I haven't had time to look things over, but it is possible that the changes in the generated api exposed a bug elsewhere. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion