Re: [Numpy-discussion] automatically avoiding temporary arrays
Hi, On Mon, Feb 27, 2017 at 11:46 AM, Benjamin Root wrote: > Ah, that wheel would be a huge help. Most of the packages I have as > dependencies for this project were compiled against v1.10, so I am hoping > that there won't be too big of a problem. I don't think so - the problems generally arise when you try to import a package compiled against a more recent version of numpy than the one you have installed. > Are the mostlinux wheels still > compatible with CentOS5, or has the base image been bumped to CentOS6? Yup, still CentOS5. When it shifts to CentOS6, probably not very soon, you'll likely see a version number bump to manylinux2 (manylinux1 is the current version). Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
Ah, that wheel would be a huge help. Most of the packages I have as dependencies for this project were compiled against v1.10, so I am hoping that there won't be too big of a problem. Are the mostlinux wheels still compatible with CentOS5, or has the base image been bumped to CentOS6? Cheers! Ben Root On Mon, Feb 27, 2017 at 2:31 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 27, 2017 at 11:27 AM, Charles R Harris > wrote: > > > > > > On Mon, Feb 27, 2017 at 11:43 AM, Benjamin Root > > wrote: > >> > >> What's the timeline for the next release? I have the perfect usecase for > >> this (Haversine calculation on large arrays that takes up ~33% of one > of my > >> processing scripts). However, to test it out, I have a huge dependency > mess > >> to wade through first, and there are no resources devoted to that > project > >> for at least a few weeks. I want to make sure I get feedback to y'all. > > > > > > I'd like to branch 1.13.x at the end of March. The planned features that > > still need to go in are the `__array_ufunc__` work and the `lapack_lite` > > update. The first RC should not take much longer. I believe Matthew is > > building wheels for testing on the fly but I don't know where you can > find > > them. > > Latest wheels at : > https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a > 83.ssl.cf2.rackcdn.com/ > > PRE_URL=https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a > 83.ssl.cf2.rackcdn.com > pip install -f $PRE_URL --pre numpy > > Cheers, > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
Hi, On Mon, Feb 27, 2017 at 11:27 AM, Charles R Harris wrote: > > > On Mon, Feb 27, 2017 at 11:43 AM, Benjamin Root > wrote: >> >> What's the timeline for the next release? I have the perfect usecase for >> this (Haversine calculation on large arrays that takes up ~33% of one of my >> processing scripts). However, to test it out, I have a huge dependency mess >> to wade through first, and there are no resources devoted to that project >> for at least a few weeks. I want to make sure I get feedback to y'all. > > > I'd like to branch 1.13.x at the end of March. The planned features that > still need to go in are the `__array_ufunc__` work and the `lapack_lite` > update. The first RC should not take much longer. I believe Matthew is > building wheels for testing on the fly but I don't know where you can find > them. Latest wheels at : https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com/ PRE_URL=https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com pip install -f $PRE_URL --pre numpy Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
On Mon, Feb 27, 2017 at 11:43 AM, Benjamin Root wrote: > What's the timeline for the next release? I have the perfect usecase for > this (Haversine calculation on large arrays that takes up ~33% of one of my > processing scripts). However, to test it out, I have a huge dependency mess > to wade through first, and there are no resources devoted to that project > for at least a few weeks. I want to make sure I get feedback to y'all. > I'd like to branch 1.13.x at the end of March. The planned features that still need to go in are the `__array_ufunc__` work and the `lapack_lite` update. The first RC should not take much longer. I believe Matthew is building wheels for testing on the fly but I don't know where you can find them. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
On Wed, Oct 5, 2016 at 5:36 PM, Robert McLeod wrote: > > It's certainly true that numexpr doesn't create a lot of OP_COPY > operations, rather it's optimized to minimize them, so probably it's fewer > ops than naive successive calls to numpy within python, but I'm unsure if > there's any difference in operation count between a hand-optimized numpy > with out= set and numexpr. Numexpr just does it for you. > That was my understanding as well. If it automatically does what one could achieve by carrying the state along in the 'out' parameter, that's as good as it can get in terms removing unnecessary ops. There are other speedup opportunities of course, but that's a separate matter. > This blog post from Tim Hochberg is useful for understanding the > performance advantages of blocking versus multithreading: > > http://www.bitsofbits.com/2014/09/21/numpy-micro-optimization-and-numexpr/ > Hadnt come across that one before. Great link. Thanks. using caches and vector registers well trumps threading, unless one has a lot of data and it helps to disable hyper-threading. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
On Wed, Oct 5, 2016 at 1:11 PM, srean wrote: > Thanks Francesc, Robert for giving me a broader picture of where this fits > in. I believe numexpr does not handle slicing, so that might be another > thing to look at. > Dereferencing would be relatively simple to add into numexpr, as it would just be some getattr() calls. Personally I will add that at some point because it will clean up my code. Slicing, maybe only for continuous blocks in memory? I.e. imageStack[0,:,:] would be possible, but imageStack[:, ::2, ::2] would not be trivial (I think...). I seem to remember someone asked David Cooke about slicing and he said something along the lines of, "that's what Numba is for." Perhaps NumPy backended by Numba is more so what you are looking for, as it hooks into the byte compiler? The main advantage of numexpr is that a series of numpy functions in can be enclosed in ne.evaluate( "" ) and it provides a big acceleration for little programmer effort, but it's not nearly as sophisticated as Numba or PyPy. > On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod > wrote: > >> >> As Francesc said, Numexpr is going to get most of its power through >> grouping a series of operations so it can send blocks to the CPU cache and >> run the entire series of operations on the cache before returning the block >> to system memory. If it was just used to back-end NumPy, it would only >> gain from the multi-threading portion inside each function call. >> > > Is that so ? > > I thought numexpr also cuts down on number of temporary buffers that get > filled (in other words copy operations) if the same expression was written > as series of operations. My understanding can be wrong, and would > appreciate correction. > > The 'out' parameter in ufuncs can eliminate extra temporaries but its not > composable. Right now I have to manually carry along the array where the in > place operations take place. I think the goal here is to eliminate that. > The numexpr virtual machine does create temporaries where needed when it parses the abstract syntax tree for all the operations it has to do. I believe the main advantage is that the temporaries are created on the CPU cache, and not in system memory. It's certainly true that numexpr doesn't create a lot of OP_COPY operations, rather it's optimized to minimize them, so probably it's fewer ops than naive successive calls to numpy within python, but I'm unsure if there's any difference in operation count between a hand-optimized numpy with out= set and numexpr. Numexpr just does it for you. This blog post from Tim Hochberg is useful for understanding the performance advantages of blocking versus multithreading: http://www.bitsofbits.com/2014/09/21/numpy-micro-optimization-and-numexpr/ Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcl...@unibas.ch robert.mcl...@bsse.ethz.ch robbmcl...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
Thanks Francesc, Robert for giving me a broader picture of where this fits in. I believe numexpr does not handle slicing, so that might be another thing to look at. On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod wrote: > > As Francesc said, Numexpr is going to get most of its power through > grouping a series of operations so it can send blocks to the CPU cache and > run the entire series of operations on the cache before returning the block > to system memory. If it was just used to back-end NumPy, it would only > gain from the multi-threading portion inside each function call. > Is that so ? I thought numexpr also cuts down on number of temporary buffers that get filled (in other words copy operations) if the same expression was written as series of operations. My understanding can be wrong, and would appreciate correction. The 'out' parameter in ufuncs can eliminate extra temporaries but its not composable. Right now I have to manually carry along the array where the in place operations take place. I think the goal here is to eliminate that. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
All, On Wed, Oct 5, 2016 at 11:46 AM, Francesc Alted wrote: > 2016-10-05 8:45 GMT+02:00 srean : > >> Good discussion, but was surprised by the absence of numexpr in the >> discussion., given how relevant it (numexpr) is to the topic. >> >> Is the goal to fold in the numexpr functionality (and beyond) into Numpy ? >> > > Yes, the question about merging numexpr into numpy has been something that > periodically shows up in this list. I think mostly everyone agree that it > is a good idea, but things are not so easy, and so far nobody provided a > good patch for this. Also, the fact that numexpr relies on grouping an > expression by using a string (e.g. (y = ne.evaluate("x**3 + tanh(x**2) + > 4")) does not play well with the way in that numpy evaluates expressions, > so something should be suggested to cope with this too. > As Francesc said, Numexpr is going to get most of its power through grouping a series of operations so it can send blocks to the CPU cache and run the entire series of operations on the cache before returning the block to system memory. If it was just used to back-end NumPy, it would only gain from the multi-threading portion inside each function call. I'm not sure how one would go about grouping successive numpy expressions without modifying the Python interpreter? I put a bit of effort into extending numexpr to use 4-byte word opcodes instead of 1-byte. Progress has been very slow, however, due to time constraints, but I have most of the numpy data types (u[1-4], i[1-4], f[4,8], c[8,16], S[1-4], U[1-4]). On Tuesday I finished writing a Python generator script that writes all the C-side opcode macros for opcodes.hpp. Now I have about 900 opcodes, and this could easily grow into thousands if more functions are added, so I also built a reverse lookup tree (based on collections.defaultdict) for the Python-side of numexpr. Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcl...@unibas.ch robert.mcl...@bsse.ethz.ch robbmcl...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
Note that numpy does store some larger arrays already, in the fft module. (In fact, this was a cache of unlimited size until #7686.) It might not be bad if the same cache were used more generally. That said, if newer versions of python are offering ways of doing this better, maybe that is the best way forward. -- Marten ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
Mon, 03 Oct 2016 15:07:28 -0400, Benjamin Root kirjoitti: > With regards to arguments about holding onto large arrays, I would like > to emphasize that my original suggestion mentioned weakref'ed numpy > arrays. > Essentially, the idea is to claw back only the raw memory blocks during > that limbo period between discarding the numpy array python object and > when python garbage-collects it. CPython afaik deallocates immediately when the refcount hits zero. It's relatively rare that you have arrays hanging around waiting for cycle breaking by gc. If you have them hanging around, I don't think it's possible to distinguish these from other arrays without running the gc. Note also that an "is an array in use" check probably always requires Julian's stack based hack since you cannot rely on the refcount. Pauli ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
With regards to arguments about holding onto large arrays, I would like to emphasize that my original suggestion mentioned weakref'ed numpy arrays. Essentially, the idea is to claw back only the raw memory blocks during that limbo period between discarding the numpy array python object and when python garbage-collects it. Ben Root On Mon, Oct 3, 2016 at 2:43 PM, Julian Taylor wrote: > On 03.10.2016 20:23, Chris Barker wrote: > > > > > > On Mon, Oct 3, 2016 at 3:16 AM, Julian Taylor > > mailto:jtaylor.deb...@googlemail.com>> > > wrote: > > > > the problem with this approach is that we don't really want numpy > > hogging on to hundreds of megabytes of memory by default so it would > > need to be a user option. > > > > > > indeed -- but one could set an LRU cache to be very small (few items, > > not small memory), and then it get used within expressions, but not hold > > on to much outside of expressions. > > numpy doesn't see the whole expression so we can't really do much. > (technically we could in 3.5 by using pep 523, but that would be a > larger undertaking) > > > > > However, is the allocation the only (Or even biggest) source of the > > performance hit? > > > > on large arrays the allocation is insignificant. What does cost some > time is faulting the memory into the process which implies writing zeros > into the pages (a page at a time as it is being used). > By storing memory blocks in numpy we would save this portion. This is > really the job of the libc, but these are usually tuned for general > purpose workloads and thus tend to give back memory back to the system > much earlier than numerical workloads would like. > > Note that numpy already has a small memory block cache but its only used > for very small arrays where the allocation cost itself is significant, > it is limited to a couple megabytes at most. > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
On 03.10.2016 20:23, Chris Barker wrote: > > > On Mon, Oct 3, 2016 at 3:16 AM, Julian Taylor > mailto:jtaylor.deb...@googlemail.com>> > wrote: > > the problem with this approach is that we don't really want numpy > hogging on to hundreds of megabytes of memory by default so it would > need to be a user option. > > > indeed -- but one could set an LRU cache to be very small (few items, > not small memory), and then it get used within expressions, but not hold > on to much outside of expressions. numpy doesn't see the whole expression so we can't really do much. (technically we could in 3.5 by using pep 523, but that would be a larger undertaking) > > However, is the allocation the only (Or even biggest) source of the > performance hit? > on large arrays the allocation is insignificant. What does cost some time is faulting the memory into the process which implies writing zeros into the pages (a page at a time as it is being used). By storing memory blocks in numpy we would save this portion. This is really the job of the libc, but these are usually tuned for general purpose workloads and thus tend to give back memory back to the system much earlier than numerical workloads would like. Note that numpy already has a small memory block cache but its only used for very small arrays where the allocation cost itself is significant, it is limited to a couple megabytes at most. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
On Mon, Oct 3, 2016 at 3:16 AM, Julian Taylor wrote: > the problem with this approach is that we don't really want numpy > hogging on to hundreds of megabytes of memory by default so it would > need to be a user option. > indeed -- but one could set an LRU cache to be very small (few items, not small memory), and then it get used within expressions, but not hold on to much outside of expressions. However, is the allocation the only (Or even biggest) source of the performance hit? If you generate a temporary as a result of an operation, rather than doing it in-place, that temporary needs to be allocated, but it also means that an additional array needs to be pushed through the processor -- and that can make a big performance difference too. I"m not entirely sure how to profile this correctly, but this seems to indicate that the allocation is cheap compared to the operations (for a million--element array) * Regular old temporary creation In [24]: def f1(arr1, arr2): ...: result = arr1 + arr2 ...: return result In [26]: %timeit f1(arr1, arr2) 1000 loops, best of 3: 1.13 ms per loop * Completely in-place, no allocation of an extra array In [27]: def f2(arr1, arr2): ...: arr1 += arr2 ...: return arr1 In [28]: %timeit f2(arr1, arr2) 1000 loops, best of 3: 755 µs per loop So that's about 30% faster * allocate a temporary that isn't used -- but should catch the creation cost In [29]: def f3(arr1, arr2): ...: result = np.empty_like(arr1) ...: arr1 += arr2 ...: return arr1 In [30]: % timeit f3(arr1, arr2) 1000 loops, best of 3: 756 µs per loop only a µs slower! Profiling is hard, and I'm not good at it, but this seems to indicate that the allocation is cheap. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
The same idea was published two years ago: http://hiperfit.dk/pdf/Doubling.pdf On Mon, Oct 3, 2016 at 8:53 AM, wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion@scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-requ...@scipy.org > > You can reach the person managing the list at > numpy-discussion-ow...@scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > >1. Re: automatically avoiding temporary arrays (Benjamin Root) >2. Re: Dropping sourceforge for releases. (David Cournapeau) >3. Re: Dropping sourceforge for releases. (Vincent Davis) > > > -- > > Message: 1 > Date: Sun, 2 Oct 2016 09:10:45 -0400 > From: Benjamin Root > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] automatically avoiding temporary > arrays > Message-ID: > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Just thinking aloud, an idea I had recently takes a different approach. The > problem with temporaries isn't so much that they exists, but rather they > they keep on malloc'ed and cleared. What if numpy kept a small LRU cache of > weakref'ed temporaries? Whenever a new numpy array is requested, numpy > could see if there is already one in its cache of matching size and use it. > If you think about it, expressions that result in many temporaries would > quite likely have many of them being the same size in memory. > > Don't know how feasible it would be to implement though. > > Cheers! > Ben Root > > > On Sat, Oct 1, 2016 at 2:38 PM, Chris Barker > wrote: > > > Julian, > > > > This is really, really cool! > > > > I have been wanting something like this for years (over a decade? wow!), > > but always thought it would require hacking the interpreter to intercept > > operations. This is a really inspired idea, and could buy numpy a lot of > > performance. > > > > I'm afraid I can't say much about the implementation details -- but great > > work! > > > > -Chris > > > > > > > > > > On Fri, Sep 30, 2016 at 2:50 PM, Julian Taylor < > > jtaylor.deb...@googlemail.com> wrote: > > > >> On 30.09.2016 23:09, josef.p...@gmail.com wrote: > >> > On Fri, Sep 30, 2016 at 9:38 AM, Julian Taylor > >> > wrote: > >> >> hi, > >> >> Temporary arrays generated in expressions are expensive as the imply > >> >> extra memory bandwidth which is the bottleneck in most numpy > >> operations. > >> >> For example: > >> >> > >> >> r = a + b + c > >> >> > >> >> creates the b + c temporary and then adds a to it. > >> >> This can be rewritten to be more efficient using inplace operations: > >> >> > >> >> r = b + c > >> >> r += a > >> > > >> > general question (I wouldn't understand the details even if I looked.) > >> > > >> > how is this affected by broadcasting and type promotion? > >> > > >> > Some of the main reasons that I don't like to use inplace operation in > >> > general is that I'm often not sure when type promotion occurs and when > >> > arrays expand during broadcasting. > >> > > >> > for example b + c is 1-D, a is 2-D, and r has the broadcasted shape. > >> > another case when I switch away from broadcasting is when b + c is int > >> > or bool and a is float. Thankfully, we get error messages for casting > >> > now. > >> > >> the temporary is only avoided when the casting follows the safe rule, so > >> it should be the same as what you get without inplace operations. E.g. > >> float32-temporary + float64 will not be converted to the unsafe float32 > >> += float64 which a normal inplace operations would allow. But > >> float64-temp + float32 is transformed. > >> > >> Currently the only broadcasting that will be transformed is temporary + > >> scalar value, otherwise it will only work on matching array sizes. > >> Though there is not really anything that prevents full broadcasting but >
Re: [Numpy-discussion] automatically avoiding temporary arrays
On 30.09.2016 23:09, josef.p...@gmail.com wrote: > On Fri, Sep 30, 2016 at 9:38 AM, Julian Taylor > wrote: >> hi, >> Temporary arrays generated in expressions are expensive as the imply >> extra memory bandwidth which is the bottleneck in most numpy operations. >> For example: >> >> r = a + b + c >> >> creates the b + c temporary and then adds a to it. >> This can be rewritten to be more efficient using inplace operations: >> >> r = b + c >> r += a > > general question (I wouldn't understand the details even if I looked.) > > how is this affected by broadcasting and type promotion? > > Some of the main reasons that I don't like to use inplace operation in > general is that I'm often not sure when type promotion occurs and when > arrays expand during broadcasting. > > for example b + c is 1-D, a is 2-D, and r has the broadcasted shape. > another case when I switch away from broadcasting is when b + c is int > or bool and a is float. Thankfully, we get error messages for casting > now. the temporary is only avoided when the casting follows the safe rule, so it should be the same as what you get without inplace operations. E.g. float32-temporary + float64 will not be converted to the unsafe float32 += float64 which a normal inplace operations would allow. But float64-temp + float32 is transformed. Currently the only broadcasting that will be transformed is temporary + scalar value, otherwise it will only work on matching array sizes. Though there is not really anything that prevents full broadcasting but its not implemented yet in the PR. > >> >> This saves some memory bandwidth and can speedup the operation by 50% >> for very large arrays or even more if the inplace operation allows it to >> be completed completely in the cpu cache. > > I didn't realize the difference can be so large. That would make > streamlining some code worth the effort. > > Josef > > >> >> The problem is that inplace operations are a lot less readable so they >> are often only used in well optimized code. But due to pythons >> refcounting semantics we can actually do some inplace conversions >> transparently. >> If an operand in python has a reference count of one it must be a >> temporary so we can use it as the destination array. CPython itself does >> this optimization for string concatenations. >> >> In numpy we have the issue that we can be called from the C-API directly >> where the reference count may be one for other reasons. >> To solve this we can check the backtrace until the python frame >> evaluation function. If there are only numpy and python functions in >> between that and our entry point we should be able to elide the temporary. >> >> This PR implements this: >> https://github.com/numpy/numpy/pull/7997 >> >> It currently only supports Linux with glibc (which has reliable >> backtraces via unwinding) and maybe MacOS depending on how good their >> backtrace is. On windows the backtrace APIs are different and I don't >> know them but in theory it could also be done there. >> >> A problem is that checking the backtrace is quite expensive, so should >> only be enabled when the involved arrays are large enough for it to be >> worthwhile. In my testing this seems to be around 180-300KiB sized >> arrays, basically where they start spilling out of the CPU L2 cache. >> >> I made a little crappy benchmark script to test this cutoff in this branch: >> https://github.com/juliantaylor/numpy/tree/elide-bench >> >> If you are interested you can run it with: >> python setup.py build_ext -j 4 --inplace >> ipython --profile=null check.ipy >> >> At the end it will plot the ratio between elided and non-elided runtime. >> It should get larger than one around 180KiB on most cpus. >> >> If no one points out some flaw in the approach, I'm hoping to get this >> into the next numpy version. >> >> cheers, >> Julian >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion