Re: [Python-Dev] Computed Goto dispatch for Python 2
won't this need python compiled with gcc 5.1 to have any effect? Which compiler version was used for the benchmark? the issue that negated most computed goto improvements (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284) was only closed very recently (r212172, 9f4ec746affbde1) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for Linux perf
On 17.11.2014 23:09, Francis Giraldeau wrote: Hi, ... The PEP-418 is about performance counters, but there is no mention o Anyway, I think we must change CPython to support tools such as perf. Any thoughts? there are some patches available adding systemtap and dtrace probes, which should at least help getting function level profiles: http://bugs.python.org/issue21590 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of C compilers for Python on Windows
On 10.10.2014 14:05, Paul Moore wrote: On 10 October 2014 10:50, Victor Stinner victor.stin...@gmail.com wrote: Is MinGW fully compatible with MSVS ABI? I read that it reuses the MSVCRT, but I don't know if it's enough. I guess that a full ABI compatibility means more than just using the C library, calling convention and much more. MinGW can be made to build ABI-compatible extensions. Whether this will continue with MSVC 15 I don't know, as it requires a change to add an interface library for the relevant msvcrXX runtime. And the MinGW community is somewhat fragmented these days, with the core project not supporting 64-bit and various external projects doing so. Having said all this, it *is* possible with some effort to use MinGW to build Python extensions. As noted, the numpy developers have done a lot of work on this as some of the libraries they need must be built with mingw. And the state of distutils support for mingw is very sad, as well, IIRC (last time I looked there were a number of open bugs, and very little movement on them). Rather than put effort into more build options for CPython, I think it would be much more beneficial to the Windows community if effort was put into: 2. Looking at ways to support cross-compiling Windows extensions from Linux using mingw. I've no idea how practical this would be, but if Linux developers could provide Windows builds without having to maintain a Windows environment, that would be great. It is practical. Numpy Windows binaries are built on linux using mingw 3.4.5 and wine. The (vagrant based) setup which is currently used is available here: https://github.com/juliantaylor/numpy-vendor For the next release we do want to look into providing official win64 binaries based on the mingw64 toolchain that has been mentioned a few times already. An attempt to do so in the last released failed due to test issues and there were no experienced debuggers available to solve them. From my perspective cross building for windows is easier than cross building for mac, but thats probably just because I never seriously looked into that. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation - temporary elision take 2
On 04.08.2014 22:22, Jim J. Jewett wrote: Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in https://mail.python.org/pipermail/python-dev/2014-August/135623.html ) wrote: Andrea Griffini agriff at tin.it wrote: However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists. hm could this be a pure python case that would profit from temporary elision [ https://mail.python.org/pipermail/python-dev/2014-June/134826.html ]? lists could declare the tp_can_elide slot and call list.extend on the temporary during its tp_add slot instead of creating a new temporary. extend/realloc can avoid the copy if there is free memory available after the block. Yes, with all the same problems. When dealing with a complex object, how can you be sure that __add__ won't need access to the original values during the entire computation? It works with matrix addition, but not with matric multiplication. Depending on the details of the implementation, it could even fail for a sort of sliding-neighbor addition similar to the original justification. The c-extension object knows what its add slot does. An object that cannot elide would simply always return 0 indicating to python to not call the inplace variant. E.g. the numpy __matmul__ operator would never tell python that it can work inplace, but __add__ would (if the arguments allow it). Though we may have found a way to do it without the direct help of Python, but it involves reading and storing the current instruction of the frame object to figure out if it is called directly from the interpreter. unfinished patch to numpy, see the can_elide_temp function: https://github.com/numpy/numpy/pull/4322.diff Probably not the best way as this is hardly intended Python C-API but assuming there is no overlooked issue with this approach it could be a good workaround for known good Python versions. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 02.08.2014 08:35, Terry Reedy wrote: On 8/2/2014 1:57 AM, Allen Li wrote: On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote: No. We just can't put all possible use cases in the docstring. :-) On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini agr...@tin.it wrote: help(sum) tells clearly that it should be used to sum numbers and not strings, and with strings actually fails. However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists. Is this to be considered a bug? Can you explain the rationale behind this design decision? It seems terribly inconsistent. Why are only strings explicitly restricted from being sum()ed? sum() should either ban everything except numbers or accept everything that implements addition (duck typing). O(n**2) behavior, ''.join(strings) alternative. hm could this be a pure python case that would profit from temporary elision [0]? lists could declare the tp_can_elide slot and call list.extend on the temporary during its tp_add slot instead of creating a new temporary. extend/realloc can avoid the copy if there is free memory available after the block. [0] https://mail.python.org/pipermail/python-dev/2014-June/134826.html ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes
On 06.06.2014 04:18, Sturla Molden wrote: On 05/06/14 22:51, Nathaniel Smith wrote: This gets evaluated as: tmp1 = a + b tmp2 = tmp1 + c result = tmp2 / c All these temporaries are very expensive. Suppose that a, b, c are arrays with N bytes each, and N is large. For simple arithmetic like this, then costs are dominated by memory access. Allocating an N byte array requires the kernel to clear the memory, which incurs N bytes of memory traffic. It seems to be the case that a large portion of the run-time in Python code using NumPy can be spent in the kernel zeroing pages (which the kernel does for security reasons). I think this can also be seen as a 'malloc problem'. It comes about because each new NumPy array starts with a fresh buffer allocated by malloc. Perhaps buffers can be reused? Sturla Caching memory inside of numpy would indeed solve this issue too. There has even been a paper written on this which contains some more serious benchmarks than the laplace case which runs on very old hardware (and the inplace and out of place cases are actually not the same, one computes array/scalar the other array * (1 / scalar)): hiperfit.dk/pdf/Doubling.pdf The result is an improvement of as much as 2.29 times speedup, on average 1.32 times speedup across a benchmark suite of 15 applications The problem with this approach is that it is already difficult enough to handle memory in numpy. Having a cache that potentially stores gigabytes of memory out of the users sight will just make things worse. This would not be needed if we can come up with a way on how python can help out numpy in eliding the temporaries. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes
On 06.06.2014 04:26, Greg Ewing wrote: Nathaniel Smith wrote: I'd be a little nervous about whether anyone has implemented, say, an iadd with side effects such that you can tell whether a copy was made, even if the object being copied is immediately destroyed. I can think of at least one plausible scenario where this could occur: the operand is a view object that wraps another object, and its __iadd__ method updates that other object. In fact, now that I think about it, exactly this kind of thing happens in numpy when you slice an array! So the opt-in indicator would need to be dynamic, on a per-object basis, rather than a type flag. yes an opt-in indicator would need to receive both operand objects so it would need to be a slot in the object or number type object. Would the addition of a tp_can_elide slot to the object types be acceptable for this rather specialized case? tp_can_elide receives two objects and returns one of three values: * can work inplace, operation is associative * can work inplace but not associative * cannot work inplace Implementation could e.g. look about like this: TARGET(BINARY_SUBTRACT) { fl = left-obj_type-tp_can_elide fr = right-obj_type-tp_can_elide elide = 0 if (unlikely(fl)) { elide = fl(left, right) } else if (unlikely(fr)) { elide = fr(left, right) } if (unlikely(elide == YES) left-refcnt == 1) { PyNumber_InPlaceSubtract(left, right) } else if (unlikely(elide == SWAPPABLE) right-refcnt == 1) { PyNumber_InPlaceSubtract(right, left) } else { PyNumber_Subtract(left, right) } } ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [numpy wishlist] PyMem_*Calloc
Hi, In NumPy what we want is the tracing, not the exchangeable allocators. I don't think it is a good idea for the core of a whole stack of C-extension based modules to replace the default allocator or allowing other modules to replace the allocator NumPy uses. I think it would be more useful if Python provides functions to register memory allocations and frees and the tracemalloc module registers handlers for these register functions. If no trace allocation tracer is registered the functions just return immediately. That way the tracemalloc can be used with arbitrary allocators as long as they register their allocations with Python. For example a hugepage allocator, which you would not want to use that as the default allocator for all python objects, but you may still want to trace its usage: my_hugetlb_alloc(size) p = mmap('hugepagefs', ..., MAP_HUGETLB); PyMem_Register_Alloc(p, size, __func__, __line__); return p my_hugetlb_free(p); PyMem_Register_Free(p, __func__, __line__); munmap(p, ...); normally the registers are nops, but if tracemalloc did register tracers the memory is tracked, e.g. tracemodule does this on start(): tracercontext.register_alloc = trace_alloc tracercontext.register_free = trace_free tracercontext.data = mycontext PyMem_SetTracer(tracercontext) Regards, Julian Taylor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] How to fix the incorrect shared library extension on linux for 3.2 and newer?
The values on macos for these variables still look wrong in 3.3.1rc1: ./configure --prefix=/Users/jtaylor/tmp/py3.3.1 --enable-shared on macosx-10.8-x86_64 sys.version_info(major=3, minor=3, micro=1, releaselevel='candidate', serial=1) SO .so EXT_SUFFIX .so SHLIB_SUFFIX 0 the only correct one here is EXT_SUFFIX, SHLIB_SUFFIX should be .dylib (libpython is a .dylib) and .SO possibly too given for what it was used in the past. 3.3.0 also returns wrong values SO .so EXT_SUFFIX None SHLIB_SUFFIX ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com