Hi Carl, On +2020-12-24 07:06:43 +0100, Carl Friedrich Bolz-Tereick wrote: > On 23.12.20 14:42, PIERRE AUGIER wrote: > > I wrote another very simple benchmark that should not depend on > > auto-vectorization. The bench function is: > > > > ```python > > def sum_x(positions): > > result = 0.0 > > for i in range(len(positions)): > > result += positions[i].x > > return result > > ``` > > This benchmark probably really shows the crux of the problem. In Python, > the various Points instances (whether with lists, or with direct > attributes) are vastly more complex beasts than the structs in Julia. > There, you can declare a struct with a certain number of Float64 fields > and be done. Thus, reading .x from such a struct is just a pointer > dereference. > > In Python, due to dynamic typing, the ability to add more fields later > and even the ability to change the class of an instance, the actual > memory layout of a Point3D type is much more complex with various > indirections and boxing. Reading .x out of such a thing is done in > several steps: > > 1) check that positions[i] is an instance > 2) check that it's an instance of Point3D > 3) read its x field > 4) check that the field is a float > 5) read the float's value > > All of these steps involve a pointer read. >
Could crafted asserts tell the compiler that dynamic stuff is not happening? Is the compiler listening? :) > Improving this situation is probably possible (there's even a paper how > to get rid of steps 1 and 2: > https://www.csl.cornell.edu/~cbatten/pdfs/cheng-type-freezing-cgo2020.pdf but > the work wasn't merged). But there are problems: > - basically every single one of these steps needs to be addressed, and > every one is its own optimization > - it's extremely delicate to get the balance and the trade-offs right, > because the object system is so central in getting good performance for > Python code across a wide variety of areas (not just numerical algorithms). > > Another approach would indeed be (as you say in the other mail) to add > support for telling PyPy explicitly that some list can contain only > instances of a specific class and (more importantly) that a class is not > to be considered to be "dynamic" meaning that its fields are fixed and > of specific types. So far, we have not really gone in such directions, > because that is language design and we leave that to the CPython devs ;-). > assert is already there :) > Note that some of your other benchmarks are not measuring what you hope! > eg I suspect that get_objects, get_xs and loop_over_list_of_objects from > your other mail get completely removed by the Julia compiler, since they > don't have side effects and don't compute anything. PyPy isn't actually > able to remove empty loops. So you are comparing empty loops in PyPy > with no code at all in Julia. > And benchmarks that don't segregate outliers, or clusters, and just average are terrible. Scattergrams are much more informative :) E.g., in a scattergram you can discover that a CPU shortcutting multiply by zero affects a subset of your computations. > Cheers, > > Carl Friedrich > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > https://mail.python.org/mailman/listinfo/pypy-dev Happy Holidays! -- Regards, Bengt Richter _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev