On 5 Jun 2014 23:58, "Terry Reedy" <tjre...@udel.edu> wrote: > > On 6/5/2014 4:51 PM, Nathaniel Smith wrote: > >> In fact, AFAICT it's 100% correct for libraries being called by >> regular python code (which is why I'm able to quote benchmarks at you >> :-)). The bytecode eval loop always holds a reference to all operands, >> and then immediately DECREFs them after the operation completes. If >> one of our arguments has no other references besides this one, then we >> can be sure that it is a dead obj walking, and steal its corpse. >> >> But this has a fatal flaw: people are unreasonable creatures, and >> sometimes they call Python libraries without going through ceval.c >> :-(. It's legal for random C code to hold an array object with a >> single reference count, and then call PyNumber_Add on it, and then >> expect the original array object to still be valid. But who writes >> code like that in practice? Well, Cython does. So, this is no-go. > > > I understand that a lot of numpy/scipy code is compiled with Cython, so you really want the optimization to continue working when so compiled. Is there a simple change to Cython that would work, perhaps in coordination with a change to numpy? Is so, you could get the result before 3.5 comes out.
Unfortunately we don't actually know whether Cython is the only culprit (such code *could* be written by hand), and even if we fixed Cython it would take some unknowable amount of time before all downstream users upgraded their Cythons. (It's pretty common for projects to check in Cython-generated .c files, and only regenerate when the Cython source actually gets modified.) Pretty risky for an optimization. > I realized that there are other compilers than Cython and non-numpy code that could benefit, so that a more generic solution would also be good. In particular > > > Here's the idea. Take an innocuous expression like: > > > > result = (a + b + c) / c > > > > This gets evaluated as: > > > > tmp1 = a + b > > tmp2 = tmp1 + c > > result = tmp2 / c > ... > > > There's an obvious missed optimization in this code, though, which is > > that it keeps allocating new temporaries and throwing away old ones. > > It would be better to just allocate a temporary once and re-use it: > > tmp1 = a + b > > tmp1 += c > > tmp1 /= c > > result = tmp1 > > Could this transformation be done in the ast? And would that help? I don't think it could be done in the ast because I don't think you can work with anonymous temporaries there. But, now that you mention it, it could be done on the fly in the implementation of the relevant opcodes. I.e., BIN_ADD could do if (Py_REFCNT(left) == 1) result = PyNumber_InPlaceAdd(left, right); else result = PyNumber_Add(left, right) Upside: all packages automagically benefit! Potential downsides to consider: - Subtle but real and user-visible change in Python semantics. I'd be a little nervous about whether anyone has implemented, say, an iadd with side effects such that you can tell whether a copy was made, even if the object being copied is immediately destroyed. Maybe this doesn't make sense though. - Only works when left operand is the temporary ("remember that a*b+c is faster than c+a*b"), and only for arithmetic (no benefit for np.sin(a + b)). Probably does cover the majority of cases though. > A prolonged discussion might be better on python-ideas. See what others say. Yeah, I wasn't sure which list to use for this one, happy to move if it would work better. -n
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com