Re: EXT: Re: series of switchpoints or better

2016-10-05 Thread Jochen Theodorou

On 05.10.2016 21:45, Charles Oliver Nutter wrote:

On Wed, Oct 5, 2016 at 1:36 PM, Jochen Theodorou > wrote:

If I hear Remi saying volatile read... then it does not sound free
to me actually. In my experience volatile reads still present
inlining barriers. But if Remi and all of you tell me it is still
basically free, then I will not look too much at the volatile ;)

The volatile read is only used in the interpreter.


ah... I see.. nice. I get the feeling Remi actually already said this...


In Groovy we use SwitchPoint as well, but only one for the whole
meta class system that could clearly improved it seems. Having a
Switchpoint per method is actually a very interesting approach I
would not have considered before, since it means creating a ton of
Switchpoint objects. Not sure if that works in practice for me since
it is difficult to make a switchpoint for a method that does not
exist in the super class, but may come into existence later on -
still it seems I should be considering this.

I suspect Groovy developers are also less likely to modify classes at
runtime? In Ruby, it's not uncommon to keep creating new classes or
modifying existing ones at runtime, though it is generally discouraged
(all runtimes suffer).


It depends a bit on the style if it is done more or less often. But I 
think the majority barely changes the classes. but compared to Ruby 
probably a lot less.


We have a construct, that adds dynamically methods to multiple classes 
with a limited thread visibility and lifetime (Categories), but those 
are actually not realized as meta class changes. Creating a new class 
can happen any time, but they tend not to be build, they are declared 
with all the methods you want in there already usually.



cold performance is a consideration for me as well though. The heavy
creation time of MethodHandles is one of the reasons we do not use
invokedynamic as much as we could... especially considering that
creating a new cache entry via runtime class generation and still
invoking the method via reflection is actually faster than producing
one of our complex method handles right now.

Creating a new cache entry via class generation? Can you elaborate on
that? JRuby has a non-indy mode, but it doesn't do any code generation
per call site.


well, the code generation is optional, otherwise we use reflection in 
that mode. WE use the technique since I think 2008. And basically you 
have an interface call(Object[]), which we produce an implementation for 
at runtime and then call it. We use MagicAccessorImpl to avoid bytecode 
validation... well... if existing/accessible, not sure that is still the 
case in jdk9 though


[...]

Ahh, so when you invalidate, you only invalidate one class, but every
call site would have a SwitchPoint for the target class and all of its
superclasses. That will be more problematic for cold performance than
JRuby's way, but less overhead when invalidating. I'm not which
trade-off is better.


have to test it out in the future.


We also use this invalidation mechanism when calling dynamic methods
from Java (since we also use call site caches there) but those sites are
not (yet) guarded by a SwitchPoint.


yes, we have a very few cases like this as well.

[...]

With recent improvements to MH boot time and cold performance, I've
started to use indy by default in more places, carefully measuring
startup overhead along the way. I'm well on my way toward having fully
invokedynamic-aware jitted code basically be all invokedynamics.


invokedynamic by default is the way to go ;)


It is also good to hear that the old "once invalidated, it will not
optimized again - ever" is no longer valid.

And hopefully it will stay that way as long as we keep making noise :-)


indeed ;)

bye Jochen

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: EXT: Re: series of switchpoints or better

2016-10-05 Thread John Rose
On Oct 5, 2016, at 12:45 PM, Charles Oliver Nutter  wrote:
>  
> It is also good to hear that the old "once invalidated, it will not optimized 
> again - ever" is no longer valid.
> 
> And hopefully it will stay that way as long as we keep making noise :-)

Go ahead, be that way!___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: EXT: Re: series of switchpoints or better

2016-10-05 Thread Charles Oliver Nutter
On Wed, Oct 5, 2016 at 1:36 PM, Jochen Theodorou  wrote:

> If I hear Remi saying volatile read... then it does not sound free to me
> actually. In my experience volatile reads still present inlining barriers.
> But if Remi and all of you tell me it is still basically free, then I will
> not look too much at the volatile ;)
>

The volatile read is only used in the interpreter.

In Groovy we use SwitchPoint as well, but only one for the whole meta class
> system that could clearly improved it seems. Having a Switchpoint per
> method is actually a very interesting approach I would not have considered
> before, since it means creating a ton of Switchpoint objects. Not sure if
> that works in practice for me since it is difficult to make a switchpoint
> for a method that does not exist in the super class, but may come into
> existence later on - still it seems I should be considering this.
>

I suspect Groovy developers are also less likely to modify classes at
runtime? In Ruby, it's not uncommon to keep creating new classes or
modifying existing ones at runtime, though it is generally discouraged (all
runtimes suffer).


> cold performance is a consideration for me as well though. The heavy
> creation time of MethodHandles is one of the reasons we do not use
> invokedynamic as much as we could... especially considering that creating a
> new cache entry via runtime class generation and still invoking the method
> via reflection is actually faster than producing one of our complex method
> handles right now.
>

Creating a new cache entry via class generation? Can you elaborate on that?
JRuby has a non-indy mode, but it doesn't do any code generation per call
site.


> As for Charles question:
>
>> Can you elaborate on the structure? JRuby has 6-deep (configurable)
>> polymorphic caching, with each entry being a GWT (to check type) and a SP
>> (to check modification) before hitting the plumbing for the method itself.
>>
>
> right now we use a 1-deep cache with several GWT (check type and argument
> types) and one SP plus several transformations. My goal is of course also
> the 6-deep polymorphic caching in the end. Just motivation for this was not
> so high before. If I use several SwitchPoint, then of course each of them
> would be there for each cache entry. How many depends on the receiver type.
> But at least one for each super class (and interface)
>

Ahh, so when you invalidate, you only invalidate one class, but every call
site would have a SwitchPoint for the target class and all of its
superclasses. That will be more problematic for cold performance than
JRuby's way, but less overhead when invalidating. I'm not which trade-off
is better.

We also use this invalidation mechanism when calling dynamic methods from
Java (since we also use call site caches there) but those sites are not
(yet) guarded by a SwitchPoint.


> To me horror I just found one pice of code commented with:
> //TODO: remove this method if possible by switchpoint usage
>

With recent improvements to MH boot time and cold performance, I've started
to use indy by default in more places, carefully measuring startup overhead
along the way. I'm well on my way toward having fully invokedynamic-aware
jitted code basically be all invokedynamics.


> It is also good to hear that the old "once invalidated, it will not
> optimized again - ever" is no longer valid.
>

And hopefully it will stay that way as long as we keep making noise :-)

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: EXT: Re: series of switchpoints or better

2016-10-05 Thread Jochen Theodorou

On 05.10.2016 18:21, MacGregor, Duncan (GE Energy Connections) wrote:

I second (third?) Charlie and Remi’s comments. SwitchPoint per method has 
worked very nicely to reduce the amount of code invalidated by meta-programming 
shenanigans. You could go further and try for a class-and-method switch point, 
but that makes it harder to eliminate class checks or use CHA.

The downside of all this kind of thing is that when stuff is invalidated it’s 
often fairly heavy weight, so it’s worth putting some thought into designing 
things to minimise the amount of code which will be invalidated when you flip a 
SwitchPoint and only invalidating things that really need it (that’s where a 
switch point per method often pays off).


well you know... even if people tell you it is basically for free, it 
usually is not, which is why I wanted a confirmation.


If I hear Remi saying volatile read... then it does not sound free to me 
actually. In my experience volatile reads still present inlining 
barriers. But if Remi and all of you tell me it is still basically free, 
then I will not look too much at the volatile ;)


In Groovy we use SwitchPoint as well, but only one for the whole meta 
class system that could clearly improved it seems. Having a 
Switchpoint per method is actually a very interesting approach I would 
not have considered before, since it means creating a ton of Switchpoint 
objects. Not sure if that works in practice for me since it is difficult 
to make a switchpoint for a method that does not exist in the super 
class, but may come into existence later on - still it seems I should be 
considering this.


cold performance is a consideration for me as well though. The heavy 
creation time of MethodHandles is one of the reasons we do not use 
invokedynamic as much as we could... especially considering that 
creating a new cache entry via runtime class generation and still 
invoking the method via reflection is actually faster than producing one 
of our complex method handles right now.


As for Charles question:

Can you elaborate on the structure? JRuby has 6-deep (configurable) polymorphic 
caching, with each entry being a GWT (to check type) and a SP (to check 
modification) before hitting the plumbing for the method itself.


right now we use a 1-deep cache with several GWT (check type and 
argument types) and one SP plus several transformations. My goal is of 
course also the 6-deep polymorphic caching in the end. Just motivation 
for this was not so high before. If I use several SwitchPoint, then of 
course each of them would be there for each cache entry. How many 
depends on the receiver type. But at least one for each super class (and 
interface)


To me horror I just found one pice of code commented with:
//TODO: remove this method if possible by switchpoint usage

which means we are currently using switchpoint as well as pinging?! 
Commit incoming ;)


It is also good to hear that the old "once invalidated, it will not 
optimized again - ever" is no longer valid.


thx a lot guys
Jochen
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: EXT: Re: series of switchpoints or better

2016-10-05 Thread MacGregor, Duncan (GE Energy Connections)
I second (third?) Charlie and Remi’s comments. SwitchPoint per method has 
worked very nicely to reduce the amount of code invalidated by meta-programming 
shenanigans. You could go further and try for a class-and-method switch point, 
but that makes it harder to eliminate class checks or use CHA.

The downside of all this kind of thing is that when stuff is invalidated it’s 
often fairly heavy weight, so it’s worth putting some thought into designing 
things to minimise the amount of code which will be invalidated when you flip a 
SwitchPoint and only invalidating things that really need it (that’s where a 
switch point per method often pays off).

Duncan.

From: mlvm-dev 
<mlvm-dev-boun...@openjdk.java.net<mailto:mlvm-dev-boun...@openjdk.java.net>> 
on behalf of Charles Oliver Nutter 
<head...@headius.com<mailto:head...@headius.com>>
Reply-To: Da Vinci Machine Project 
<mlvm-dev@openjdk.java.net<mailto:mlvm-dev@openjdk.java.net>>
Date: Wednesday, 5 October 2016 at 15:00
To: Da Vinci Machine Project 
<mlvm-dev@openjdk.java.net<mailto:mlvm-dev@openjdk.java.net>>
Subject: EXT: Re: series of switchpoints or better

Hi Jochen!

On Wed, Oct 5, 2016 at 7:37 AM, Jochen Theodorou 
<blackd...@gmx.org<mailto:blackd...@gmx.org>> wrote:
If the meta class for A is changed, all handles operating on instances of A may 
have to reselect. the handles for B and Object need not to be affected. If the 
meta class for Object changes, I need to invalidate all the handles for A, B 
and Object.

This is exactly how JRuby's type-modification guards work. We've used this 
technique since our first implementation of indy call sites.

Doing this with switchpoints means probably one switchpoint per metaclass and a 
small number of meta classes per class (in total 3 in my example). This would 
mean my MethodHandle would have to get through a bunch of switchpoints, before 
it can do the actual method invocation. And while switchpoints might be fast it 
does not sound good to me.

>From what I've seen, it's fine as far as hot performance. Adding complexity to 
>your handle chains likely impacts cold perf, of course.

Can you elaborate on the structure? JRuby has 6-deep (configurable) polymorphic 
caching, with each entry being a GWT (to check type) and a SP (to check 
modification) before hitting the plumbing for the method itself.

I will say that using SwitchPoints is FAR better than our alternative 
mechanism: pinging the (meta)class each time and checking a serial number.

Or I can do one switchpoint for all methodhandles in the system, which makes me 
wonder if after a meta class change the callsite ever gets Jitted again. The 
later performance penalty is actually also not very attractive to me.

We have fought to keep the JIT from giving up on us, and I believe that as of 
today you can invalidate call sites forever and the JIT will still recompile 
them (within memory, code cache, and other limits of course).

However, you'll be invalidating every call site for every modification. If the 
system eventually settles, that's fine. If it doesn't, you're going to be stuck 
with cold call site performance most of the time.

So what is the way to go here? Or is there an even better way?

I strongly recommend the switchpoint-per-class granularity (or finer, like 
switchpoint-per-class-and-method-name, which I am playing with now).

- Charlie



___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev