Re: [Python-Dev] Computed Goto dispatch for Python 2

2015-05-28 Thread Julian Taylor
won't this need python compiled with gcc 5.1 to have any effect? Which
compiler version was used for the benchmark?
the issue that negated most computed goto improvements
( was only closed
very recently (r212172, 9f4ec746affbde1)
Python-Dev mailing list

Re: [Python-Dev] Support for Linux perf

2014-11-22 Thread Julian Taylor
On 17.11.2014 23:09, Francis Giraldeau wrote:
 The PEP-418 is about performance counters, but there is no mention o
 Anyway, I think we must change CPython to support tools such as perf.
 Any thoughts? 

there are some patches available adding systemtap and dtrace probes,
which should at least help getting function level profiles:

Python-Dev mailing list

Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-10 Thread Julian Taylor
On 10.10.2014 14:05, Paul Moore wrote:
 On 10 October 2014 10:50, Victor Stinner wrote:
 Is MinGW fully compatible with MSVS ABI? I read that it reuses the
 MSVCRT, but I don't know if it's enough. I guess that a full ABI
 compatibility means more than just using the C library, calling
 convention and much more.
 MinGW can be made to build ABI-compatible extensions. Whether this
 will continue with MSVC 15 I don't know, as it requires a change to
 add an interface library for the relevant msvcrXX runtime. And the
 MinGW community is somewhat fragmented these days, with the core
 project not supporting 64-bit and various external projects doing so.
 Having said all this, it *is* possible with some effort to use MinGW
 to build Python extensions. As noted, the numpy developers have done a
 lot of work on this as some of the libraries they need must be built
 with mingw. And the state of distutils support for mingw is very sad,
 as well, IIRC (last time I looked there were a number of open bugs,
 and very little movement on them).
 Rather than put effort into more build options for CPython, I think it
 would be much more beneficial to the Windows community if effort was
 put into:
 2. Looking at ways to support cross-compiling Windows extensions from
 Linux using mingw. I've no idea how practical this would be, but if
 Linux developers could provide Windows builds without having to
 maintain a Windows environment, that would be great.

It is practical. Numpy Windows binaries are built on linux using mingw
3.4.5 and wine.
The (vagrant based) setup which is currently used is available here:
For the next release we do want to look into providing official win64
binaries based on the mingw64 toolchain that has been mentioned a few
times already. An attempt to do so in the last released failed due to
test issues and there were no experienced debuggers available to solve them.

From my perspective cross building for windows is easier than cross
building for mac, but thats probably just because I never seriously
looked into that.
Python-Dev mailing list

Re: [Python-Dev] sum(...) limitation - temporary elision take 2

2014-08-11 Thread Julian Taylor
On 04.08.2014 22:22, Jim J. Jewett wrote:
 Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in ) wrote:
 Andrea Griffini agriff at wrote:
However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.
 hm could this be a pure python case that would profit from temporary
 elision [ 
 lists could declare the tp_can_elide slot and call list.extend on the
 temporary during its tp_add slot instead of creating a new temporary.
 extend/realloc can avoid the copy if there is free memory available
 after the block.
 Yes, with all the same problems.
 When dealing with a complex object, how can you be sure that __add__
 won't need access to the original values during the entire computation?
 It works with matrix addition, but not with matric multiplication.
 Depending on the details of the implementation, it could even fail for
 a sort of sliding-neighbor addition similar to the original justification.

The c-extension object knows what its add slot does. An object that
cannot elide would simply always return 0 indicating to python to not
call the inplace variant.
E.g. the numpy __matmul__ operator would never tell python that it can
work inplace, but __add__ would (if the arguments allow it).

Though we may have found a way to do it without the direct help of
Python, but it involves reading and storing the current instruction of
the frame object to figure out if it is called directly from the
unfinished patch to numpy, see the can_elide_temp function:
Probably not the best way as this is hardly intended Python C-API but
assuming there is no overlooked issue with this approach it could be a
good workaround for known good Python versions.
Python-Dev mailing list

Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Julian Taylor
On 02.08.2014 08:35, Terry Reedy wrote:
 On 8/2/2014 1:57 AM, Allen Li wrote:
 On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote:
 No. We just can't put all possible use cases in the docstring. :-)

 On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini wrote:

  help(sum) tells clearly that it should be used to sum numbers
 and not
  strings, and with strings actually fails.

  However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.

  Is this to be considered a bug?

 Can you explain the rationale behind this design decision?  It seems
 terribly inconsistent.  Why are only strings explicitly restricted from
 being sum()ed?  sum() should either ban everything except numbers or
 accept everything that implements addition (duck typing).
 O(n**2) behavior, ''.join(strings) alternative.

hm could this be a pure python case that would profit from temporary
elision [0]?

lists could declare the tp_can_elide slot and call list.extend on the
temporary during its tp_add slot instead of creating a new temporary.
extend/realloc can avoid the copy if there is free memory available
after the block.

Python-Dev mailing list

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-06 Thread Julian Taylor
On 06.06.2014 04:18, Sturla Molden wrote:
 On 05/06/14 22:51, Nathaniel Smith wrote:
 This gets evaluated as:

 tmp1 = a + b
 tmp2 = tmp1 + c
 result = tmp2 / c

 All these temporaries are very expensive. Suppose that a, b, c are
 arrays with N bytes each, and N is large. For simple arithmetic like
 this, then costs are dominated by memory access. Allocating an N byte
 array requires the kernel to clear the memory, which incurs N bytes of
 memory traffic.
 It seems to be the case that a large portion of the run-time in Python
 code using NumPy can be spent in the kernel zeroing pages (which the
 kernel does for security reasons).
 I think this can also be seen as a 'malloc problem'. It comes about
 because each new NumPy array starts with a fresh buffer allocated by
 malloc. Perhaps buffers can be reused?

Caching memory inside of numpy would indeed solve this issue too. There
has even been a paper written on this which contains some more serious
benchmarks than the laplace case which runs on very old hardware (and
the inplace and out of place cases are actually not the same, one
computes array/scalar the other array * (1 / scalar)):
The result is an improvement of as much as 2.29 times speedup, on
average 1.32 times speedup across a benchmark suite of 15 applications

The problem with this approach is that it is already difficult enough to
handle memory in numpy. Having a cache that potentially stores gigabytes
of memory out of the users sight will just make things worse.

This would not be needed if we can come up with a way on how python can
help out numpy in eliding the temporaries.
Python-Dev mailing list

Re: [Python-Dev] [numpy wishlist] Interpreter support for temporary elision in third-party classes

2014-06-06 Thread Julian Taylor
On 06.06.2014 04:26, Greg Ewing wrote:
 Nathaniel Smith wrote:
 I'd be a little nervous about whether anyone has implemented, say, an
 iadd with side effects such that you can tell whether a copy was made,
 even if the object being copied is immediately destroyed.
 I can think of at least one plausible scenario where
 this could occur: the operand is a view object that
 wraps another object, and its __iadd__ method updates
 that other object.
 In fact, now that I think about it, exactly this
 kind of thing happens in numpy when you slice an
 So the opt-in indicator would need to be dynamic, on
 a per-object basis, rather than a type flag.

yes an opt-in indicator would need to receive both operand objects so it
would need to be a slot in the object or number type object.
Would the addition of a tp_can_elide slot to the object types be
acceptable for this rather specialized case?

tp_can_elide receives two objects and returns one of three values:
* can work inplace, operation is associative
* can work inplace but not associative
* cannot work inplace

Implementation could e.g. look about like this:

   fl = left-obj_type-tp_can_elide
   fr = right-obj_type-tp_can_elide
   elide = 0
   if (unlikely(fl)) {
  elide = fl(left, right)
   else if (unlikely(fr)) {
  elide = fr(left, right)
   if (unlikely(elide == YES)  left-refcnt == 1) {
   PyNumber_InPlaceSubtract(left, right)
   else if (unlikely(elide == SWAPPABLE)  right-refcnt == 1) {
   PyNumber_InPlaceSubtract(right, left)
   else {
   PyNumber_Subtract(left, right)
Python-Dev mailing list

[Python-Dev] [numpy wishlist] PyMem_*Calloc

2014-04-16 Thread Julian Taylor
In NumPy what we want is the tracing, not the exchangeable allocators.
I don't think it is a good idea for the core of a whole stack of
C-extension based modules to replace the default allocator or allowing
other modules to replace the allocator NumPy uses.

I think it would be more useful if Python provides functions to
register memory allocations and frees and the tracemalloc module
registers handlers for these register functions.
If no trace allocation tracer is registered the functions just return
That way the tracemalloc can be used with arbitrary allocators as long
as they register their allocations with Python.

For example a hugepage allocator, which you would not want to use that
as the default allocator for all python objects, but you may still
want to trace its usage:

p = mmap('hugepagefs', ..., MAP_HUGETLB);
PyMem_Register_Alloc(p, size, __func__, __line__);
return p

PyMem_Register_Free(p, __func__, __line__);
munmap(p, ...);

normally the registers are nops, but if tracemalloc did register
tracers the memory is tracked, e.g. tracemodule does this on start():
tracercontext.register_alloc = trace_alloc
tracercontext.register_free = trace_free = mycontext

Julian Taylor
Python-Dev mailing list

Re: [Python-Dev] How to fix the incorrect shared library extension on linux for 3.2 and newer?

2013-04-04 Thread Julian Taylor

The values on macos for these variables still look wrong in 3.3.1rc1:

./configure --prefix=/Users/jtaylor/tmp/py3.3.1 --enable-shared
on macosx-10.8-x86_64

sys.version_info(major=3, minor=3, micro=1, releaselevel='candidate', 

SO .so

the only correct one here is EXT_SUFFIX, SHLIB_SUFFIX should be .dylib 
(libpython is a .dylib) and .SO possibly too given for what it was used 
in the past.

3.3.0 also returns wrong values
SO .so
Python-Dev mailing list