Re: [pypy-dev] array performace?

William Leslie Sat, 03 Jul 2010 07:26:27 -0700

On 3 July 2010 08:56, Bengt Richter <[email protected]> wrote:
> On 07/02/2010 11:35 AM Carl Friedrich Bolz wrote:
> A thought/question:
>
> Could/does JIT make use of information in an assert statement? E.g., could we 
> write
>     assert set(type(x) for x in img) == set([float]) and len(img)==640*480
> in front of a loop operating on img and have JIT use the info as assumed true
> even when "if __debug__:" suites are optimized away?

There are several reasons we can't make use of such information from
the JIT at the moment. It requires more information that we have, and
it is difficult to analyse quickly. If img is visible from outside the
current thread, for example, the ad-hoc memory model of the python
language means we would have to order writes and reads to img from
other threads with the JIT's own accesses. Similarly, functions that
we call may insert objects that break this invariant. Determining when
this may occur requires analysing a lot of code - for example, if
*one* type was not int, it could implement a __radd__ method that
broke the invariant. It's typically faster to just execute the code
than to find out.

In the presence of whole-program optimisation this sort of thing is
possible, with the right analysis it may be possible within the JIT,
but the question remains as to if it will be profitable. (This is an
area I have been exploring, but don't hold your breath for results.)

On 3 July 2010 10:38, Bengt Richter <[email protected]> wrote:
> On 07/02/2010 04:14 PM Amaury Forgeot d'Arc wrote:
>> If efficient python code needs this, I'd better write the loop in C
>> and explicitly choose the types.
>> The C code could be inlined in the python script, and compiled on demand.
>> At least you'll know what you get.
>>
> Well, even C accepts hints like 'register' (and may ignore you, so you are 
> not truly sure what you get ;-)
>
> The point of using assert would be to let the user remain within the python 
> language, while still passing
> useful hints to the compiler.

Interesting you mention racket. Racket comes with a static language
that integrates with their usual dynamic Scheme. Many common lisp
implementations provide optional typing. Paolo recently bemoaned the
trend toward writing modules at interp level for speed* - I'm not
really sure if it is a trend now or not - but at some point it might
be fun looking at optional typing annotations that compile the case
for those assumptions. It might be a precursor to cython or pyrex
support.

* with justification : though ok for the stdlib, translating pypy
every time you add an extension module is going to get old. fast.

> Could such assertions allow e.g. a list to be implemented as a homogeneous 
> vector
> of unboxed representations?

Pypy is already great in terms of data layout, for example pypy uses
shadow classes in the form of 'structures', but supporting more
complicated layout optimisations (such as row or column order storage
for structures so the JIT can do relational algebra) would probably be
unique. It doesn't seem so far off considering that in the progression
(list int) -> (list unpacked tuple int) -> (list unpacked homogenous
structure), the first step, limiting or otherwise determining the item
type, is the most complicated.

> If I wanted to mix languages (not uninteresting!), I'd go with
> racket (the star formerly known as PLT-scheme)

-- possible can of worms --

As for mixing languages, that is the pinnacle of awesome; but this is
probably not the list for it. MLVMs such as JVM+JSR-292, Racket, GNU
Guile, and Parrot; it seems to me that once you settle on an execution
/ object model and / or bytecode format, you've already decided what
languages (where the 's' seems superfluous) support is going to be
first class for. Don't get me wrong, I find each of these really
exciting, but good multi-platform integration is a much harder problem
than writing a few compilers with a common bytecode format; and even
the common bytecode format is probably not a good idea, because
different languages need (really) different primatives, as pirate has
bought out.

Other impedance mismatches, such as calling conventions (eg,
javascript and lua functions silently accepting an incorrect number of
arguments), reduction methods (applicative vs normal order vs
call-by-name), mutable strings, TCE, various type systems involving
structural types, Oliviera/Sulzmann classes, existential types,
dependant types, value types, single and multiple inheretance, and the
completely insane (prolog) make implementing real multi-language
platforms a mammoth task. And even if you manage to get that working,
how do you make exception hierarchies work? Why can't I cast my Java
ArrayList as a C# ArrayList? etc.

Sure, you could probably hook up a few of the bundled VMs, IO or E
would make for a great twisted integration DSL. But actually
convincing people to lock themselves into an unstandardised, unproven
chimera? Lets just say that doing multi-language right is NP-hard.
Doing it while targeting JVM and CLI, offering platform integration
while supporting exotic language constructs like real continuations?
Likely impossible. It's a nice idea, but probably out of Pypy's scope.

-- 
William Leslie
_______________________________________________
[email protected]
http://codespeak.net/mailman/listinfo/pypy-dev

Re: [pypy-dev] array performace?

Reply via email to