Re: [Pharo-dev] [Vm-dev] re: Parsing Pharo syntax to C/C++

Eliot Miranda Wed, 17 Sep 2014 10:19:19 -0700

Hi Thierry,

On Wed, Sep 17, 2014 at 9:57 AM, Thierry Goubier <thierry.goub...@gmail.com>
wrote:


> Hi Eliot,
>
> Le 17/09/2014 00:12, Eliot Miranda a écrit :
>
>> Hi Thierry,
>>
>> On Tue, Sep 16, 2014 at 1:06 AM, Thierry Goubier
>> <thierry.goub...@gmail.com <mailto:thierry.goub...@gmail.com>> wrote:
>>
>> There is no "outside" in Sista.  It is an image-level optimizer, so
>> you'll be able to interact with it at the same level one interacts with
>> the Opal compiler.  Of course doing this kind of strength-reduction is
>> possible.
>>
>
> Ok. I'll have a look and maybe profile a bit; SmaCC has plenty of those
> optimisations that I had to remove in the Pharo port, and I should have a
> look on the cost of doing all thoses Character value: XXX and ^ Dictionary
> new...
>
> I also had nice optimisations in VW for my trace tool (like replacing a
> class lookup by a Literal which was then replaced by the singleton instance
> in the method bytecode) which do not have any effect in Squeak/Pharo
> anymore.


Yes, and various dialects have a literal constructor for "compile-time
expressions" (I did one for VW) that provide a manual (and hence fragile)
way of doing this.  As you pointed out this approach breaks when one
changes the code previously compiled.  Having Sista regenerate for you
automatically  is of course preferrable.  But what do you mean "do not have
any effect in Squeak/Pharo anymore"?


 I have no idea.  All I know is that the code generation side of the
>> Cogit is extremely cheap, dwarfed by the costs of relinking when code is
>> recompiled.  But the cost is vanishingly small.  The costs that do show
>> up in the Cogit are the cost of discarding, compacting and relinking
>> code when the working set grows (e.g. Compiler recompileAll), and pause
>> times here are ~ 1ms on my 2.2GHz MBP; thats about twice the cost of a
>> scavenge, at about 1/20th the frequency (these times are for Spur). If
>> you open up the system reporter you'll be able to see for yourself.  And
>> if you use the VMProfiler you can see exactly here the time goes for
>> your favourite synthetic benchmark.  See if you can figure out how to
>> create bytecoded methods cheaper than the JIT can compile them and
>> profile it.
>>
>
> I have colleagues in the same department working on localized, template
> based JiT of computational kernels and they do 3 inst of JiT for one
> instruction (and prove that they go faster on some matrix math code,
> including the JiT overhead, than anything statically optimised off-line).
>

Sure.  My JIT optimizes and cannot be reduced to a template approach.  But
it is still extremely fast.  As I said, scanning code to relink/compact/GC
is what takes the time, not compilation itself.  And the pause times here
are bounded (see below).

At the same time, a one ms pause at 2.2GHz is a hell of a long time if your
> target is a MicroBlaze at 50 Mhz or an embedded powerpc at 25 MHz.


Sure, there are bound to be contexts in which Cog can't meet the required
deadlines.  But that doesn't imply it can never do real time, does it?  And
processor speeds are climbing.  Look at Raspberry Pi, 700MHz.  Not too
shabby.


 If you've got a tool that can count instructions let me know and I may
>> measure.  I could could bytecodes per instruction generated in the
>> simulator ;-).
>>
>
> I used some of the intel cycle CPU counters in the past to benchmark code.
> I'll ask around for accessing Intel's cycle count registers. I'm sure the
> Bochs x86 simulator has those.
>
>          b) it would be easy to extend the VM to cause the Cogit to
>>         precompile specified methods.  We could easily provide a
>>         "lock-down" facility that would prevent Cog from discarding
>>         specific machine code methods.
>>
>>
>>     That one is interesting. It sounds like a great benefit would be to
>>     have a limited API to the Cog JIT for that sort of requirements.
>>
>>             And of course, you have to perform lot of profiling.
>>
>>
>>         Early and often :-).
>>
>>         Because we can have complete control over the optimizer, and
>>         because Sista is byetcode-to-bytecode and can hence store its
>>         results in the image in the form of optimized methods, I believe
>>         that Sista is well-positioned for real-time since it can be used
>>         before deployment.  In fact we should emphasise this in the
>>         papers we write on Sista.
>>
>>
>>     I'm not sure of that "real-time" aspect is a good idea. More
>>     real-time than others dynamic optimisers, maybe. Note that Java
>>     Real-Time suggest turning off JIT and switch to ITC.
>>
>>
>> But thats because in Java VMs the adaptive optimizer is in the VM where
>> one has limited control over it.  Sista is different.
>>
>
> The problem is that even a simple JiT is considered too much Jitter :)
>

But some in some contexts.  But that doesn't define the entire real time
world.

The Ocaml guys told once they had to do a special, interpreted VM for a
> real-time use case. The problem is not going fast, the problem is being
> predictable. They turn off CPU caches in some cases.
>

Sure.  But if one can show that the JIT's pause times are bounded and well
spaced then there is no fundamental problem.  People have been implementing
real-time GCs and using GC in real time systems for over 20 years now.
 This is no different.  But you seem absolute in denying that there is any
possibility of using a JIT in real time.  I think that claim is clearly
false.

-- 
best,
Eliot

Re: [Pharo-dev] [Vm-dev] re: Parsing Pharo syntax to C/C++

Reply via email to