[Numpy-discussion] How to make lazy derived arrays in a recarray view of a memmap on large files
Hi numpy forum I need to efficiently handle some large (300 MB) recordlike binary files, where some data fields are less than a byte and thus cannot be mapped in a record dtype immediately. I would like to be able to access these derived arrays in a memory efficient manner but I cannot figure out how to acheive this. My application of the derived arrays would never be to do operation on the entire array, rather iterate over some selected elements and do somthing about it - operations which seems well suited for doing on demand I wrote a related post yesterday, which I have not received any response on. I am now posting again using another example and perhaps more clear example which I beleive describes my problem spot on from numpy import * # Python.exe memory use here: 8.14 MB desc = dtype([(baconandeggs, u1), (spam,u1), (parrots,u1)]) index = memmap(g:/id-2008-10-25-17-ver4.idx, dtype = desc, mode=r).view(recarray) # The index file is very large, contains 292 MB of data # Python.exe memory use: 8.16 MB, only 20 kB extra for memmap mapped to recarray # The following instant operation takes a few secs working on 3*10^8 elements # How can I derive new array in a lazy/ondemand/memmap manner? index.bacon = index.baconandeggs 4 # python.exe memory use: 595 MB! Not surprising but how to do better?? # Another derived array, which is resource demanding index.eggs = index.baconandeggs 0x0F # python.exe memory usage is now 731 MB! What I'd like to do is implement a class, LazyBaconEggsSpamParrots, which encapsulates the derived arrays such that I could do besp = LazyBaconEggsSpamParrots(baconeggsspamparrots.idx) for b in besp.bacon: #Iterate lazy spam(b) #Only derive the 1000 needed elements, don't do all 100 dosomething(besp.bacon[100:1001000]) I envision the class would look something like this class LazyBaconEggsSpamParrots(object): def __init__(self, filename): desc = dtype([(baconandeggs, u1), (spam,u1), (parrots,u1)]) self._data = memmap(filename, dtype=desc, mode='r').view(recarray) # Expose the one-to-one data directly self.spam = self._data.spam self.parrots = self._data.parrots # This would work but costs way too much memory # self.bacon = self._data.baconandeggs 4 # self.eggs = self._data.baconandeggs 0x0F def __getattr__(self, attr_name): if attr_name == bacon: # return bacon in an on demand manner, but how? elif attr_name == eggs: # return eggs in an on demand manner, but how? else: # If the name is not a data attribute treat it as a normal # non-existing attribute - raise AttributeError raise AttributeError but how to do the lazy part of it? -- Kim ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
Announcing Numexpr 1.1 Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like 3*a+4*b) are accelerated and use less memory than doing the same calculation in Python. The expected speed-ups for Numexpr respect to NumPy are between 0.95x and 15x, being 3x or 4x typical values. The strided and unaligned case has been optimized too, so if the expresion contains such arrays, the speed-up can increase significantly. Of course, you will need to operate with large arrays (typically larger than the cache size of your CPU) to see these improvements in performance. This release is mainly intended to put in sync some of the improvements that had the Numexpr version integrated in PyTables. So, this standalone version of Numexpr will benefit of the well tested PyTables' version that has been in production for more than a year now. In case you want to know more in detail what has changed in this version, have a look at ``RELEASE_NOTES.txt`` in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] remove need for site.cfg on default
On Thu, Jan 15, 2009 at 11:53 PM, David Cournapeau courn...@gmail.comwrote: On Fri, Jan 16, 2009 at 12:18 AM, Darren Dale dsdal...@gmail.com wrote: Hi Jarrod, On Wed, Jan 14, 2009 at 2:21 AM, Jarrod Millman mill...@berkeley.edu wrote: Due to the fact that I was tired of adding site.cfg to scipy and numpy when building on Fedora and Ubuntu systems as well as a scipy ticket (http://scipy.org/scipy/numpy/ticket/985), I decided to try and add default system paths to numpy.distutils. You can find out more details on this ticket: http://projects.scipy.org/scipy/scipy/ticket/817 I would like to have as many people test this as possible. So I would like everyone, who has had to build numpy/scipy with a site.cfg despite having installed all the dependencies in the system default locations, to test this. If you would please update to the numpy trunk and build numpy and scipy without your old site.cfg. Regardless of whether it works or not, I would appreciate it if you could let me know. I tested on Gentoo, which does some non-standard stuff in an attempt to be able to switch between implementations. The libraries in /usr/lib are symlinks to the implementations in /usr/lib/blas/, and you can have various implementations like atlas, reference, or threaded-atlas, so I can run a program called eselect that lets me switch between the implementations. For some reason, gentoo renames the f77blas libs to simply blas. That really baffles me. Why do distributions think it is ok to randomly rename libraries - they certainly could handle a eselect system without renaming libraries, so that they avoid breaking the softwares depending on the libraries, I am in complete agreement. I've raised the issue with them before and didnt make any headway. Maybe its time to try again. Darren ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] error handling with f2py?
On Thu, January 15, 2009 6:17 pm, Sturla Molden wrote: Is it possible to make f2py raise an exception if a fortran routine signals an error? If I e.g. have subroutine foobar(a, ierr) Can I get an exception automatically raised if ierr != 0? Yes, for that you need to provide your own fortran call code using f2py callstatement construct. The initial fortran call code can be obtained from f2py generated modulenamemodule.c file, for instance. An example follows below: Fortran file foo.f: --- subroutine foo(a, ierr) integer a integer ierr if (a.gt.10) then ierr=2 else if (a.gt.5) then ierr=1 else ierr = 0 end if end if end Generated (f2py -m m foo.f) and then modified signature file m.pyf: --- !-*- f90 -*- ! Note: the context of this file is case sensitive. python module m ! in interface ! in :m subroutine foo(a,ierr) ! in :m:foo.f integer :: a integer :: ierr intent (in, out) a intent (hide) ierr callstatement ''' (*f2py_func)(a, ierr); if (ierr==1) { PyErr_SetString(PyExc_ValueError, a is gt 5); } if (ierr==2) { PyErr_SetString(PyExc_ValueError, a is gt 10); } ''' end subroutine foo end interface end python module m ! This file was auto-generated with f2py (version:2_5618). ! See http://cens.ioc.ee/projects/f2py2e/ Build the extension module and use from python: --- $ f2py -c m.pyf foo.f $ python import m m.foo(30) --- type 'exceptions.ValueError'Traceback (most recent call last) /home/pearu/test/f2py/exc/ipython console in module() type 'exceptions.ValueError': a is gt 10 m.foo(6) --- type 'exceptions.ValueError'Traceback (most recent call last) /home/pearu/test/f2py/exc/ipython console in module() type 'exceptions.ValueError': a is gt 5 m.foo(4) 4 HTH, Pearu ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Singular Matrix problem with Matplit lib in Numpy (Windows - AMD64)
Hello. I am terribly sorry. I was mistaken last night. I had the latest Matplotlib version 0.98.5.2 and I thought the bug was fixed but it hasn't. Let me explain. In the file MPL_isnan.h line 26 there is a declaration: typedef long int MPL_Int64 This is fine for Linux 64-bit, but NOT for Windows XP 64-bit! For Windows the declaration should be: typedef long longMPL_Int64 This bug has caused me a LOT of late nights and last night was one of them. The declaration is correct for Linux 64-bit and I guess Matplotlib was developed on Linux because of this declaration. That is also why I thought the bug was fixed but this morning I realised that I was looking at the wrong console. So, in summary. For Matplotlib 0.98.5.2 and Numpy 1.2.1 to work without any problems. This means compiling and using Numpy and Matplotlib on Windows XP 64-bit using AMD 64-bit compile environment, change line 26 in the file MPL_isnan.h from long int to long long.\ I also previously suggested switching MKL and ACML etc. but with this change everything is fine. One can choose any math library and it works. Writing a small test application using sizeof on different platforms highlights the problem. Thanks. George. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
Francesc Alted schrieb: Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like 3*a+4*b) are accelerated and use less memory than doing the same calculation in Python. The expected speed-ups for Numexpr respect to NumPy are between 0.95x and 15x, being 3x or 4x typical values. The strided and unaligned case has been optimized too, so if the expresion contains such arrays, the speed-up can increase significantly. Of course, you will need to operate with large arrays (typically larger than the cache size of your CPU) to see these improvements in performance. Just recently I had a more detailed look at numexpr. Clever idea, easy to use! I can affirm an typical performance gain of 3x if you work on large arrays (100k entries). I also gave a try to the vector math library (VML), contained in Intel's Math Kernel Library. This offers a fast implementation of mathematical functions, operating on array. First I implemented a C extension, providing new ufuncs. This gave me a big performance gain, e.g., 2.3x (5x) for sin, 6x (10x) for exp, 7x (15x) for pow, and 3x (6x) for division (no gain for add, sub, mul). The values in parantheses are given if I allow VML to use several threads and to employ both cores of my Intel Core2Duo computer. For large arrays (100M entries) this performance gain is reduced because of limited memory bandwidth. At this point I was stumbling across numexpr and modified it to use the VML functions. For sufficiently long and complex numerical expressions I could get the maximum performance also for large arrays. Together with VML numexpr seems to be a extremely powerful to get an optimum performance. I would like to see numexpr extended to (optionally) make use of fast vectorized math functions. There is one but: VML supports (at the moment) only math on contiguous arrays. At a first try I didn't understand how to enforce this limitation in numexpr. I also gave a quick try to the equivalent vector math library, acml_mv of AMD. I only tried sin and log, gave me the same performance (on a Intel processor!) like Intels VML . I was also playing around with the block size in numexpr. What are the rationale that led to the current block size of 128? Especially with VML, a larger block size of 4096 instead of 128 allowed to efficiently use multithreading in VML. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. I was missing the support for single precision floats. Great work! Gregor ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.testing.asserts and masked array
On Jan 16, 2009, at 10:51 AM, josef.p...@gmail.com wrote: I have a regression result with masked arrays that produces a masked array output, estm5.yhat, and I want to test equality to the benchmark case, estm1.yhat, with the asserts in numpy.testing, but I am getting strange results. ... Whats the trick to assert_almost_equal for masked arrays? Us numpy.ma.testutils.assert_almost_equal instead. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
A Friday 16 January 2009, j...@physics.ucf.edu escrigué: Hi Francesc, Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like 3*a+4*b) are accelerated and use less memory than doing the same calculation in Python. Please pardon my ignorance as I know this project has been around for a while. It this looks very exciting, but either it's cumbersome, or I'm not understanding exactly what's being fixed. If you can accelerate evaluation, why not just integrate the faster math into numpy, rather than having two packages? Or is this something that is only an advantage when the expression is given as a string (and why is that the case)? It would be helpful if you could put the answer on your web page and in your standard release blurb in some compact form. I guess what I'm really looking for when I read one of those is a quick answer to the question should I look into this?. Well, there is a link in the project page to the Overview section of the wiki, but perhaps is a bit hidden. I've added some blurb as you suggested in the main page an another link to the Overview wiki page. Hope that, by reading the new blurb, you can see why it accelerates expression evaluation with regard to NumPy. If not, tell me and will try to come with something more comprehensible. Right now, I'm not quite sure whether the problem you are solving is merely the case of expressions-in-strings, and there is no advantage for expressions-in-code, or whether your expressions-in-strings are faster than numpy's expressions-in-code. In either case, it would appear this would be a good addition to the numpy core, and it's past 1.0, so why keep it separate? Even if there is value in having a non-numpy version, is there not also value in accelerating numpy by default? Having the expression encapsulated in a string has the advantage that you exactly know the part of the code that you want to parse and accelerate. Making NumPy to understand parts of the Python code that can be accelerated sounds more like a true JIT for Python, and this is something that is not trivial at all (although, with the advent of PyPy there are appearing some efforts in this direction [1]). [1] http://www.enthought.com/~ischnell/paper.html Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] error handling with f2py?
On 1/16/2009 2:16 PM, Pearu Peterson wrote: Yes, for that you need to provide your own fortran call code using f2py callstatement construct. The initial fortran call code can be obtained from f2py generated modulenamemodule.c file, for instance. Thank you, Pearu :) f2py is really a wonderful tool. Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
Hi Francesc, this is a wonderful project ! I was just wondering if you would / could support single precision float arrays ? In 3+D image analysis we generally don't have enough memory to effort double precision; and we could save our selves lots of extra C coding (or Cython) coding of we could use numexpr ;-) Thanks, Sebastian Haase On Fri, Jan 16, 2009 at 5:04 PM, Francesc Alted fal...@pytables.org wrote: A Friday 16 January 2009, j...@physics.ucf.edu escrigué: Hi Francesc, Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like 3*a+4*b) are accelerated and use less memory than doing the same calculation in Python. Please pardon my ignorance as I know this project has been around for a while. It this looks very exciting, but either it's cumbersome, or I'm not understanding exactly what's being fixed. If you can accelerate evaluation, why not just integrate the faster math into numpy, rather than having two packages? Or is this something that is only an advantage when the expression is given as a string (and why is that the case)? It would be helpful if you could put the answer on your web page and in your standard release blurb in some compact form. I guess what I'm really looking for when I read one of those is a quick answer to the question should I look into this?. Well, there is a link in the project page to the Overview section of the wiki, but perhaps is a bit hidden. I've added some blurb as you suggested in the main page an another link to the Overview wiki page. Hope that, by reading the new blurb, you can see why it accelerates expression evaluation with regard to NumPy. If not, tell me and will try to come with something more comprehensible. Right now, I'm not quite sure whether the problem you are solving is merely the case of expressions-in-strings, and there is no advantage for expressions-in-code, or whether your expressions-in-strings are faster than numpy's expressions-in-code. In either case, it would appear this would be a good addition to the numpy core, and it's past 1.0, so why keep it separate? Even if there is value in having a non-numpy version, is there not also value in accelerating numpy by default? Having the expression encapsulated in a string has the advantage that you exactly know the part of the code that you want to parse and accelerate. Making NumPy to understand parts of the Python code that can be accelerated sounds more like a true JIT for Python, and this is something that is not trivial at all (although, with the advent of PyPy there are appearing some efforts in this direction [1]). [1] http://www.enthought.com/~ischnell/paper.html Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.testing.asserts and masked array
On Fri, Jan 16, 2009 at 10:59 AM, Pierre GM pgmdevl...@gmail.com wrote: On Jan 16, 2009, at 10:51 AM, josef.p...@gmail.com wrote: I have a regression result with masked arrays that produces a masked array output, estm5.yhat, and I want to test equality to the benchmark case, estm1.yhat, with the asserts in numpy.testing, but I am getting strange results. ... Whats the trick to assert_almost_equal for masked arrays? Us numpy.ma.testutils.assert_almost_equal instead. Thanks, when I looked at the test files for mstats, I forgot to check the imports. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
A Friday 16 January 2009, Gregor Thalhammer escrigué: I also gave a try to the vector math library (VML), contained in Intel's Math Kernel Library. This offers a fast implementation of mathematical functions, operating on array. First I implemented a C extension, providing new ufuncs. This gave me a big performance gain, e.g., 2.3x (5x) for sin, 6x (10x) for exp, 7x (15x) for pow, and 3x (6x) for division (no gain for add, sub, mul). Wow, pretty nice speed-ups indeed! In fact I was thinking in including support for threading in Numexpr (I don't think it would be too difficult, but let's see). BTW, do you know how VML is able to achieve a speedup of 6x for a sin() function? I suppose this is because they are using SSE instructions, but, are these also available for 64-bit double precision items? The values in parantheses are given if I allow VML to use several threads and to employ both cores of my Intel Core2Duo computer. For large arrays (100M entries) this performance gain is reduced because of limited memory bandwidth. At this point I was stumbling across numexpr and modified it to use the VML functions. For sufficiently long and complex numerical expressions I could get the maximum performance also for large arrays. Cool. Together with VML numexpr seems to be a extremely powerful to get an optimum performance. I would like to see numexpr extended to (optionally) make use of fast vectorized math functions. Well, if you can provide the code, I'd be glad to include it in numexpr. The only requirement is that the VML must be optional during the build of the package. There is one but: VML supports (at the moment) only math on contiguous arrays. At a first try I didn't understand how to enforce this limitation in numexpr. No problem. At the end of the numexpr/necompiler.py you will see some code like: # All the opcodes can deal with strided arrays directly as # long as they are undimensional (strides in other # dimensions are dealt within the extension), so we don't # need a copy for the strided case. if not b.flags.aligned: ... which you can replace with something like: # need a copy for the strided case. if VML_available and not b.flags.contiguous: b = b.copy() elif not b.flags.aligned: ... That would be enough for ensuring that all the arrays are contiguous when they hit numexpr's virtual machine. Being said this, it is a shame that VML does not have support for strided/unaligned arrays. They are quite common beasts, specially when you work with heterogeneous arrays (aka record arrays). I also gave a quick try to the equivalent vector math library, acml_mv of AMD. I only tried sin and log, gave me the same performance (on a Intel processor!) like Intels VML . I was also playing around with the block size in numexpr. What are the rationale that led to the current block size of 128? Especially with VML, a larger block size of 4096 instead of 128 allowed to efficiently use multithreading in VML. Experimentation. Back in 2006 David found that 128 was optimal for the processors available by that time. With the Numexpr 1.1 my experiments show that 256 is a better value for current Core2 processors and most expressions in our benchmark bed (see benchmarks/ directory); hence, 256 is the new value for the chunksize in 1.1. However, be in mind that 256 has to be multiplied by the itemsize of each array, so the chunksize is currently 2048 bytes for 64-bit items (int64 or float64) and 4096 for double precision complex arrays, which are probably the sizes that have to be compared with VML. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. I was missing the support for single precision floats. Yeah. This is because nobody has implemented it before, but it is completely doable. Great work! You are welcome! And thanks for excellent feedback too! Hope we can have a VML-aware numexpr anytime soon ;-) Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
A Friday 16 January 2009, Sebastian Haase escrigué: Hi Francesc, this is a wonderful project ! I was just wondering if you would / could support single precision float arrays ? As I said before, it is doable, but I don't know if I will have time enough to implement this myself. In 3+D image analysis we generally don't have enough memory to effort double precision; and we could save our selves lots of extra C coding (or Cython) coding of we could use numexpr ;-) Well, one of the ideas that I'm toying long time ago is to provide the capability to Numexpr to work with PyTables disk-based objects. That way, you would be able to evaluate potentially complex expressions by using data that is completely on-disk. But this might be a completely different thing of what you are talking about. Cheers, -- Francesc Alted ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
Francesc Alted wrote: A Friday 16 January 2009, j...@physics.ucf.edu escrigué: Right now, I'm not quite sure whether the problem you are solving is merely the case of expressions-in-strings, and there is no advantage for expressions-in-code, or whether your expressions-in-strings are faster than numpy's expressions-in-code. In either case, it would appear this would be a good addition to the numpy core, and it's past 1.0, so why keep it separate? Even if there is value in having a non-numpy version, is there not also value in accelerating numpy by default? Having the expression encapsulated in a string has the advantage that you exactly know the part of the code that you want to parse and accelerate. Making NumPy to understand parts of the Python code that can be accelerated sounds more like a true JIT for Python, and this is something that is not trivial at all (although, with the advent of PyPy there are appearing some efforts in this direction [1]). A full compiler/JIT isn't needed, there's another route: One could use the Numexpr methodology together with a symbolic expression framework (like SymPy or the one in Sage). I.e. operator overloads and lazy expressions. Combining NumExpr with a symbolic manipulation engine would be very cool IMO. Unfortunately I don't have time myself (and I understand that you don't, I'm just mentioning it). Example using psuedo-Sage-like syntax: a = np.arange(bignum) b = np.arange(bignum) x, y = sage.var(x, y) expr = sage.integrate(x + y, x) z = expr(x=a, y=b) # z = a**2/2 + b, but Numexpr-enabled -- Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Singular Matrix problem with Matplitlib in Numpy (Windows - AMD64)
John Hunter wrote: Andrew, since you are the original author of the isnan port, could you patch the branch and the trunk to take care of this? Done in r6791 and r6792. Sorry for the trouble. Now I just hope we don't get a problem with long long, although now if _ISOC99_SOURCE is defined, we'll preferentially use int64_t out of stdint.h, so I should think this is more portable on sane platforms. This one of many reasons why I stick to Python... -Andrew JDH On Fri, Jan 16, 2009 at 8:07 AM, George g...@emss.co.za wrote: Hello. I am terribly sorry. I was mistaken last night. I had the latest Matplotlib version 0.98.5.2 and I thought the bug was fixed but it hasn't. Let me explain. In the file MPL_isnan.h line 26 there is a declaration: typedef long int MPL_Int64 This is fine for Linux 64-bit, but NOT for Windows XP 64-bit! For Windows the declaration should be: typedef long longMPL_Int64 This bug has caused me a LOT of late nights and last night was one of them. The declaration is correct for Linux 64-bit and I guess Matplotlib was developed on Linux because of this declaration. That is also why I thought the bug was fixed but this morning I realised that I was looking at the wrong console. So, in summary. For Matplotlib 0.98.5.2 and Numpy 1.2.1 to work without any problems. This means compiling and using Numpy and Matplotlib on Windows XP 64-bit using AMD 64-bit compile environment, change line 26 in the file MPL_isnan.h from long int to long long.\ I also previously suggested switching MKL and ACML etc. but with this change everything is fine. One can choose any math library and it works. Writing a small test application using sizeof on different platforms highlights the problem. Thanks. George. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
Francesc Alted schrieb: A Friday 16 January 2009, Gregor Thalhammer escrigué: I also gave a try to the vector math library (VML), contained in Intel's Math Kernel Library. This offers a fast implementation of mathematical functions, operating on array. First I implemented a C extension, providing new ufuncs. This gave me a big performance gain, e.g., 2.3x (5x) for sin, 6x (10x) for exp, 7x (15x) for pow, and 3x (6x) for division (no gain for add, sub, mul). Wow, pretty nice speed-ups indeed! In fact I was thinking in including support for threading in Numexpr (I don't think it would be too difficult, but let's see). BTW, do you know how VML is able to achieve a speedup of 6x for a sin() function? I suppose this is because they are using SSE instructions, but, are these also available for 64-bit double precision items? I am not an expert on SSE instructions, but to my knowledge there exist (in the Core 2 architecture) no SSE instruction to calculate the sin. But it seems to be possible to (approximately) calculate a sin with a couple of multiplication/ addition instructions (and they exist in SSE for 64-bit float). Intel (and AMD) seems to use a more clever algorithm, efficiently implemented than the standard implementation. Well, if you can provide the code, I'd be glad to include it in numexpr. The only requirement is that the VML must be optional during the build of the package. Yes, I will try to provide you with a polished version of my changes, making them optional. There is one but: VML supports (at the moment) only math on contiguous arrays. At a first try I didn't understand how to enforce this limitation in numexpr. No problem. At the end of the numexpr/necompiler.py you will see some code like: # All the opcodes can deal with strided arrays directly as # long as they are undimensional (strides in other # dimensions are dealt within the extension), so we don't # need a copy for the strided case. if not b.flags.aligned: ... which you can replace with something like: # need a copy for the strided case. if VML_available and not b.flags.contiguous: b = b.copy() elif not b.flags.aligned: ... That would be enough for ensuring that all the arrays are contiguous when they hit numexpr's virtual machine. Ah I see, that's not difficult. I thought copying is done in the virtual machine. (didn't read all the code ...) Being said this, it is a shame that VML does not have support for strided/unaligned arrays. They are quite common beasts, specially when you work with heterogeneous arrays (aka record arrays). I have the impression that you can already feel happy if these mathematical libraries support a C interface, not only Fortran. At least the Intel VML provides functions to pack/unpack strided arrays which seem work on a broader parameter range than specified (also zero or negative step sizes). I also gave a quick try to the equivalent vector math library, acml_mv of AMD. I only tried sin and log, gave me the same performance (on a Intel processor!) like Intels VML . I was also playing around with the block size in numexpr. What are the rationale that led to the current block size of 128? Especially with VML, a larger block size of 4096 instead of 128 allowed to efficiently use multithreading in VML. Experimentation. Back in 2006 David found that 128 was optimal for the processors available by that time. With the Numexpr 1.1 my experiments show that 256 is a better value for current Core2 processors and most expressions in our benchmark bed (see benchmarks/ directory); hence, 256 is the new value for the chunksize in 1.1. However, be in mind that 256 has to be multiplied by the itemsize of each array, so the chunksize is currently 2048 bytes for 64-bit items (int64 or float64) and 4096 for double precision complex arrays, which are probably the sizes that have to be compared with VML. So the optimum block size might depend on the type of expression and if VML functions are used. On question: the block size is set by a #define, is there a significantly poorer performance if you use a variable instead? Would be more flexible, especially for testing and tuning. I was missing the support for single precision floats. Yeah. This is because nobody has implemented it before, but it is completely doable. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help with interpolating missing values from a 3D scanner
Thanks for all the ideas. I think I will look into the scikits.delaunay, Rbf, or gaussian smoothing approach. My best idea is similar to the Gaussian smoothing. Anyway, all of the missing data gaps seem to be small enough that I expect any of these methods to accomplish my purpose. I have read some of the work on PDE's and in- painting but I think it is overkill for this particular application. I will let you know how it works. Thanks again, Dave ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Please don't use google code for hosting
Hi, I am just visiting colleagues in the Cuban Neuroscience Center, and of course I'm trying to persuade them that Python and open-source are the way forward. This is made more difficult because several projects - for example pyglet - have their repositories on Google code. Google, unlike any other hoster I've come across, bans code downloads from Cuban addresses with a message that I am trying to download from 'a forbidden country'. I assume they are playing safe with code export regulations. This breaks easy_install and direct downloads. I hope you agree with me that it serves nobody's interest to reduce collaboration with Cuban scientists on free software. So, please, if you are considering google code for hosting, consider other options. Best, Matthew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Please don't use google code for hosting
On Fri, Jan 16, 2009 at 7:07 PM, Matthew Brett matthew.br...@gmail.comwrote: So, please, if you are considering google code for hosting, consider other options. Seems odd that you'd post that from a gmail account. I do sympathize with your suggestion, but I don't have a better alternative to Google code for my project. Maybe it would be best to address this limitation with Google directly. -Kevin Jacobs ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Please don't use google code for hosting
On Fri, 16 Jan 2009 19:24:56 -0500 Kevin Jacobs jac...@bioinformed.com bioinfor...@gmail.com wrote: On Fri, Jan 16, 2009 at 7:07 PM, Matthew Brett matthew.br...@gmail.comwrote: So, please, if you are considering google code for hosting, consider other options. Seems odd that you'd post that from a gmail account. I do sympathize with your suggestion, but I don't have a better alternative to Google code for my project. Maybe it would be best to address this limitation with Google directly. I have to worry a bit about export control laws at work, and I don't remember what Cuba's status is, but I do know that for a small number of countries, North Korea, Iran, and maybe Cuba, exporting anything of value is a potentially serious crime with prison time - so I can fully appreciate why Google would be careful. The US does prosecute, and they have levied amazingly large fines against companies in recent years. Quite possible those restrictions for Cuba will be loosened in the next few months, from what I have read recently. -- --- | Alan K. Jackson| To see a World in a Grain of Sand | | a...@ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | --- ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Please don't use google code for hosting
On Fri, Jan 16, 2009 at 18:07, Matthew Brett matthew.br...@gmail.com wrote: Hi, I am just visiting colleagues in the Cuban Neuroscience Center, and of course I'm trying to persuade them that Python and open-source are the way forward. This is made more difficult because several projects - for example pyglet - have their repositories on Google code. Google, unlike any other hoster I've come across, bans code downloads from Cuban addresses with a message that I am trying to download from 'a forbidden country'. I assume they are playing safe with code export regulations. This breaks easy_install and direct downloads. I hope you agree with me that it serves nobody's interest to reduce collaboration with Cuban scientists on free software. So, please, if you are considering google code for hosting, consider other options. As a workaround, you can ask the authors of the code your colleagues are interested in to place their release tarballs on pypi in addition to Google Code (the caveat being the 10MB/file limit imposed by the admins. Complain to them, too!). For SVN access, you can probably set up a bzr mirror on launchpad for them. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] tensor contractions
Suppose I have a 3d array, A, with dimensions 2 x 2 x N, and a 2d 2 x N array, u. I interpret A as N 2x2 matrices and u as N 2d vectors. Suppose I want to apply the mth matrix to the mth vector, i.e. A[, , m] u[, m] = v[, m] Aside from doing A[0,0,:] u[0,:] + A[0,1,:] u[1,:] = v[0,:] and A[1,0,:] u[0,:] + A[1,1,:] u[1,:] = v[1,:] is there a smart way to perform this computation? -gideon ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
Note that Apple has a similar library called vForce: http://developer.apple.com/ReleaseNotes/Performance/RN-vecLib/index.html http://developer.apple.com/documentation/Performance/Conceptual/vecLib/Reference/reference.html I think these libraries use several techniques and are not necessarily dependent on SSE. The apple versions appear to only support float and double (no complex), and I don't see anything about strided arrays. At one point I thought there was talk of adding support for vForce into the respective ufuncs. I don't know if anybody followed up on that. On 2009-01-16, at 10:48, Francesc Alted wrote: Wow, pretty nice speed-ups indeed! In fact I was thinking in including support for threading in Numexpr (I don't think it would be too difficult, but let's see). BTW, do you know how VML is able to achieve a speedup of 6x for a sin() function? I suppose this is because they are using SSE instructions, but, are these also available for 64-bit double precision items? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
2009/1/16 Gregor Thalhammer gregor.thalham...@gmail.com: Francesc Alted schrieb: Wow, pretty nice speed-ups indeed! In fact I was thinking in including support for threading in Numexpr (I don't think it would be too difficult, but let's see). BTW, do you know how VML is able to achieve a speedup of 6x for a sin() function? I suppose this is because they are using SSE instructions, but, are these also available for 64-bit double precision items? I am not an expert on SSE instructions, but to my knowledge there exist (in the Core 2 architecture) no SSE instruction to calculate the sin. But it seems to be possible to (approximately) calculate a sin with a couple of multiplication/ addition instructions (and they exist in SSE for 64-bit float). Intel (and AMD) seems to use a more clever algorithm, Here is the lib I use for the transcendental functions SSE implementations: http://gruntthepeon.free.fr/ssemath/ (only simple precision float though). -- Olivier ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion