Re: A simple PIC api

2015-03-30 Thread Mark Roos
Hi Jochen

On why I am looking at projections for
behavior determination.

In my case I use an object reference
in the receiver to determine the method to execute. One
could say that this is a projection
from the class to an address which is an integer. So in this
case computing the projection is the
same as testing for a class match. However I see this as
limiting. The address could be
64 bits but I only have 5000 classes so many of the bits are
not used. My theory is that if
I replace the object reference with a long ( call it a bit vector )
and then compute the mapping of the
class behavior to some set of bits then I can compute
the target behavior by logical operations
on the projections for each arg. Since these would
all be ops on longs ( or ints maybe)
they would be fast to perform.

So my current efforts at analysis is
to handle the following use cases.

For call sites where the number of receivers
it less than some small number I plan to just
use the standard compare the class behavior
approach. This could be faster if the item
compared is a integer vs a reference.

For cases where there are only a few
implementations of a method but many receivers. I plan
to use some logical combination of bits
so select the method.

For cases where the class of an argument
plus the receiver is used. ( specialization ). Some
bits again.

For the case of many methods and many
receivers.  Not sure about this yet. This is usually
some method like 'name' which returns
a constant. Perhaps moving the constant to an instance
var would trade memory size for performance.

I also believe that I can determine
these projections via a static analysis plus some profiling.
One question will be how much does the
profiling help.

It may be true that a tree structured
test is just as effective as manipulations of a bit vector. It
may also be true that the cases which
could benefit from projections do not impact the apps
performance. Makes for a nice
analysis.

In any case I agree with you on performance
being the goal.  Probably a optimal GWT chain
will in the end be all that is needed.


regards
mark





___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-18 Thread Jochen Theodorou

Am 16.03.2015 00:42, schrieb Mark Roos:

Jochen writes
 So far I have avoided using projections since I am not sure
about how to
do this. Basically I am missing a way to project a Class to an
usable
int.

After this discussion, reading Rémi's suggested paper and some more
analysis of my code base
I am beginning to think that a projection based PIC could be a good
solution.  I agree with your
concern about how such a projection would be computed but I feel that
there is a solution for
that.  Hopefully it will hold up in reality.


I was waiting a while to see if maybe somebody else jumps in and gives a 
suggestion...


I had a bad experience once with trying to make a key for a method 
selection for caching in a hashmap. This is kind of an projection as 
well. But anyway... in the end the caching there costs almost as much as 
doing the method selection and the gain is near zero. An we are here the 
focus is not the cost of method selection so much as more an efficient 
way to compute an int out of receiver and arguments , which will be more 
efficient than a cascade of guards where each checks the classes for a 
cached call. In both cases you would have to get the classes of the 
arguments and the receiver once. And in both cases you would need a loop 
to check over the classes, unless you can be sure that your projection 
will be bimorphic.  And I doubt that would be the case. At least not 
bimorphic and efficient to compute at the same time.


So in the end it is a question of what can be optimized better.

bye Jochen
--
Jochen blackdrag Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-15 Thread Mark Roos
Jochen writes
So far I have avoided using projections since I am not sure about 
how to 
do this. Basically I am missing a way to project a Class to an 
usable 
int.

After this discussion, reading Rémi's suggested paper and some more 
analysis of my code base 
I am beginning to think that a projection based PIC could be a good 
solution.  I agree with your 
concern about how such a projection would be computed but I feel that 
there is a solution for 
that.  Hopefully it will hold up in reality.

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-13 Thread Jochen Theodorou

Am 12.03.2015 19:37, schrieb Mark Roos:

Jochen

The comment on the test part of the pic is interesting. Since I am
looking at multimethods
I would like to have a better  understanding of how you decide which
code to dispatch at a site.


We currently have no PIC in Groovy. The naive implementation would be 
using the guards I use to check for the validity of my current method 
selection. But that means several guards to check the receiver and each 
argument as well as a switchpoint to ensure the validity of the meta 
class. So for a N argument call we have up-to N+1 guards. The guard will 
ensure the value is of the expected class and not null. Since I 
transport static information if available I can skip the guard for 
primitives. I could in theory skip the class check for final classes, 
but since null would be still a legal value and since it would 
potentially change the selected method, I would still have a guard. What 
I could do further, which I haven't done yet, is to check the overloads 
of the method. If there are none, I need only one guard for the 
receiver. And depending on the signatures I could also omit single 
guards. Problem is that the internal structures we currently have, have 
not been written to make this happen efficiently, which is why I did not 
go further yet.



My pic suggestion assumes that one test method is applied to the
arguments and its result
used to select the code to execute.   This may be too limiting to be the
general case.


But you made me think... considering a guard like this:

boolean testArguments(Object[] args, Class[] types) {
  for (int i=0; itypes.length; i++) {
if (types[i]!=null) {
if (args[i]==null || args[i].getClass()!=types[i]) return false;
} else {
if (args[i]!=null) return false;
  }
  return true;
}

plus some dropping of arguments to skip parts I don't want to guard, I 
could indeed go with one test handle. For null and class. The question 
though is how the performance of this would be. (Haven't tested it)



The way I reason about a pic is that it is presented the current stack
args and if it does
not see a match the stack and the site are passed to the fallback.  The
fallback has
two responsibilities, to find the code for the current args and to
generate a test which
when applied to args in the future will dispatch the same code.


Yes, that was the same in my suggestion


I see
this test as
a pattern match against some facet of the args.  Since I use a chain of
GWTs each guard
can have a different test/match.  I am looking at some form of bit
vector for the facets and the
pattern to match ( as in the paper Rémi suggested) so I don't have to
have a complex tree
like test.


Yes, a tree of tests should be avoided, I totally agree. Same for the 
code I have shown above probably.


So far I have avoided using projections since I am not sure about how to 
do this. Basically I am missing a way to project a Class to an usable 
int. I guess if I wanted to I could use the hashcode sum as a 
potentially quick first test and then continue from there.



Perhaps my original thought of just declaring the site to be a pic and
letting the jvm do its
best optimizing the target will be the best.


needs testing I guess

bye Jochen

--
Jochen blackdrag Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-12 Thread Jochen Theodorou

Am 12.03.2015 00:13, schrieb Mark Roos:

Hi Jochen

I have to think about yours some more but I thought I would share mine.
  I condensed it to
make it easier to explain.


my pleasure

[…]

I extended callsite to hold a bunch of values one of which is the depth.
And my model for the cache is up to 10 chained GWTs after that I drop the
chain and start over.

When I get a miss in the cache off to the fallback ( look below to see
how this MH is made)
*static*RtObject fallbackSelf(RtCallSite site, Object... args)

For each guard I create a test MH which takes all of the args and
returns a bool.
Since I only look at the TOS it drops the rest,  yours is more
complicated.  This can be
different for each guard in my case.
 RtObject rcvr=(RtObject)args[args.length- 1];  // top of stack
test=MethodHandles./insertArguments/(test, 0, rcvr.classField());  //
the behavior field
  test=MethodHandles./dropArguments/(test, 0, site.dropArray());
  // match test signature

   then finds the code based on the site and the receiver in this case
 MethodHandle newTarget=rcvr.lookupSelf(site.getSelector());

   makes a new GWT with the existing chain as its slow path


my main problem is that in most cases a single test won't do. In many 
cases I have to check at least the class of the receiver as well as of 
each argument. That makes for a 2 arg call a least 3 tests. But I cannot 
really express a Java like . Instead I have to use test number 1, with 
slow path in the false case and the real invocation target in the true 
case. Then for each additional test I use the root of the MethodHandle 
tree as the true case and the slow path as fallback.


Or is there a better way to do this? Maybe that would be a nice addition 
to the MethodHandles API actually. Something to be able to express logic 
and/or. I imagine this could simplify code generation in the JVM and 
reduce the number of Objets you need.



   if the cache is full will make the bootstrap MH (see below) the
target dropping the existing cache
***if*(site._depth 10)
 MethodHandle gwt=MethodHandles./guardWithTest/(test, newTarget,
site.getRealTarget());
site._depth= site._depth+ 1;  // bump the depth

   inserts it at the start of the chain by making it the new target
site.setRealTarget(gwt);

   and invokes it ( has to spread the args here as they were compressed
on the way to the fallback)
 MethodHandle invoke=newTarget.asSpreader(Object[].*class*,
site.getAirity());
result=(RtObject)invoke.invokeExact((Object[])args);


yeah, in Remi's example it falls back ot a vtable like approach. But 
such things would in my proposal go into the megamorphicFallback and 
there you can of course easily make a handle that will reset the cache


Of course the API I suggested is more complicated, because it tries to 
abstract. Remi's code for example works and is fairly small. But since 
the goal was to have a certain API the JVM can look at in more detail, 
since it will follow a certain structure I tried to come up with 
something we all can use. It is far from perfect of course and surely 
needs simplification here and there or does not consider other things. I 
only thought I give that idea to the list to have some base to work 
with, because I am not sure how I would be able to use your suggest API 
for Groovy for example


bye Jochen


--
Jochen blackdrag Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-12 Thread MacGregor, Duncan (GE Energy Management)
We did have optional instrumentation to maintain the PIC counts, and used
that to guide our choice of ordering, but we didn¹t use it on a per PIC
level to do anything at run time, it was just a case of gathering a lot of
data and printing out the stats. It did add some overhead, but I think
Vlad¹s work for profiling of GWTs show that it could be done with an
efficient API. How much real benefit it would offer though, I don¹t know.

Duncan.

On 11/03/2015 17:48, Marrows, George A (GE Energy Management)
george.marr...@ge.com wrote:

 so the order of the handles is never changed to for example trying the
last one first or the one with the most hits recently. Is it not worth
the trouble?

We found a useful gain at one point from reordering GWT chains so that
the new MH was put on the end of the chain and the chain as a whole was
therefore sorted chronologically, with first seen at the top. That
obviously requires a bit of book-keeping to arrange. We haven't
experimented with maintaining our own hit counts, and in fact I think we
last checked the perf gain from reversing the chain on Java 7, so results
may no longer be relevant.

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-12 Thread Mark Roos
Jochen

The comment on the test part of the pic is interesting. Since I am looking 
at multimethods 
I would like to have a better  understanding of how you decide which code 
to dispatch at a site. 
My pic suggestion assumes that one test method is applied to the arguments 
and its result 
used to select the code to execute.   This may be too limiting to be the 
general case.

The way I reason about a pic is that it is presented the current stack 
args and if it does
not see a match the stack and the site are passed to the fallback.  The 
fallback has
two responsibilities, to find the code for the current args and to 
generate a test which
when applied to args in the future will dispatch the same code.  I see 
this test as
a pattern match against some facet of the args.  Since I use a chain of 
GWTs each guard
can have a different test/match.  I am looking at some form of bit vector 
for the facets and the
pattern to match ( as in the paper Rémi suggested) so I don't have to have 
a complex tree
like test. 

Perhaps my original thought of just declaring the site to be a pic and 
letting the jvm do its
best optimizing the target will be the best.

ragards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-12 Thread Mark Roos
Thanks Rémi,  I was looking for a paper like that.  Not for multimethods 
but for a way to 
improve code reuse across a hierarchy.  Will savor it later with a fine 
pinot :)

What I was thinking about for multi methods was a simpler tree like 
approach.
http://dl.acm.org/citation.cfm?id=28732

In my proposed pic the test method handle examines the arguments and 
returns a behavior
token.  By using a custom test for that call site I believe that the 
overhead will be acceptable.
But I don't want to recalculate it N times for N guards. Of course I will 
be able to implement
this with just the current MH support.

If you want efficiency, you still have the issue that to be 
efficient you need to inline the code of all method 
handles inside the code of the caller and you can not do that if 
there 
are too many possible implementations i.e. if the callsite is 
megamorphic.

My thoughts around megamorphic sites is to note that they very often have 
only a few implementations ( like Object at: ).
I was thinking about inverting the test logic when a site gets too large 
and I know there are less than N implementations.
The paper you referenced may help here.

Currently, if a callsite is monomorphic and you have a way to 
prove it upfront, you can use a switchpoint

Agree,  but all sites can be assumed to be such until proven otherwise 
(80%)   It would be nice if a single GWT would be no more
expensive.

If you have a polymorphic callsite, guards are your savior, it's 
work pretty well until you 
have too many guards (it's a linear scan after all).

Yes this is my current model

If you have a megamorphic callsite, you can use an exactInvoker + 
a fold, but you basically give up on inlining.

This may be a good fallback

The other solution is to ask the VM to do a tableswitch on the 
method handle value (or on an int that represents 
the method handle value). We have no specific method handle for 
this case so you will have to generate bytecode.

I was going to try this ( I already do my own bytecodes ) so thanks for 
the reinforcement.  My roadblock here is how to
convert the 20K behaviors to the int to use for the table switch, it would 
be very sparse.  I think due to the rarity of sites
with more than 10 receivers that I may just take the hit of letting them 
be rebuilt when they pass some size.  This follows
an observation that often sites which are megamorphic in the long term are 
not over shorter time spans.

So let me try to understand what you want, you want a combiner 
that will do an inlining cache or a polymorphic inlining 
cache with guards if there are few possible targets, this combiner 
will be able re-arrange the guard checks because the user 
guarantee that the check do no side effect. And if there are too 
many guards, the combiner will use a tableswitch to still try 
to inline targets at callsite until there are so many possible 
targets that the cominer will in that case only use an 
exactInvoker + a fold.

how far am I from what you are thinking ?

I think I would be happy with that. But I would also be happy with a 
simpler building block which satisfies a subset of
these requirements with blazing speed.

My minimum is a construct which 
manages a short ( 1 to ?? ) inline set of GWTs and turns them into 
compare/branch instructions.

takes a method handle which when passed the stack args returns an 
integer to use for the selection
so only one execution of the test method

if there is a miss or overflow passes to the fallback mh for a 
lookup

can optionally insert the new mh into the front of the managed set 
or just execute it and discard it 
( handles megamorphic  or a table lookup)  based on the reply from 
the fallback ( a tuple  of int,mh)

all paths are assumed fast

always inlined

the optimizer doing the right thing


thanks again
mark




___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-12 Thread Mark Roos
Rémi your suggested paper and comments caused me to take a look at my code
base some more.  What I found was that for a given selector+arity 93% of 
them
have 5 implementations or less ( across 2000 classes and 25K methods). 
Combining
this with my prior observations that 99% of the call sites have less than 
5 receivers
implies to me that a 5 element pic with appropriate pattern matching could 
cover
most everything.  The key will be how to come up with a pattern (bit 
vector?) from
the args which can be tested quickly.  I think I may be able to statically 
compute this
given a 64 bit field as the arg behavior identifier.

Thanks again,  off to follow the cites.

mark

fyi 60% had only one implementation___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Jochen Theodorou

Am 11.03.2015 15:37, schrieb Remi Forax:
[...]

Sometime, yes, if you count the frequency of each branch by example,
here is another code that implements a bimorphic inlining cache that
expand to a dispatch table local to a callsite if necessary, I think you
can adapt this code to implement what Mark want quite easily

https://code.google.com/p/jsr292-cookbook/source/browse/trunk/bimorphic-cache/src/jsr292/cookbook/bicache/RT.java


but obviously the DispatchMap case is not something that will be nice to 
performance, or not? I am was wondering how performance would drop if I 
have an array for counting invocations for each part of the cache as 
well as total number of counts and how that would impact performance. 
Because obviously on each invocation I would have to call a little 
method that does the counting as well as an additional guard to cause 
the reorganization once a certain threshold is reached... And I know 
every guard will lower performance.


Anyway... based on Marks initial post as well as Remi's idea I would 
come up with the following... which is mostly an abstraction and 
refactoring of Remi's code.


class InlineCacheT extends MutableCallSite {
int MAX_DEPTH = 5
int depth
T callsite
...
InlineCache(T callsite) {...}
public void installCache()
public final Object internalFallback(...)
public abstract MH fallback(T callsite, Object[] args)
public abstract MH megamorphicFallback(T callsite, Object[] args)
public final MH insert(MH target, MH... tests)
}

in case of a cache miss and max depth not reached fallback is called and 
supposed to call insert for the new target, the method will do the 
integration into the cache, and return a handle you can return from 
fallback. megamorphicFallback would be called in case of maximum depth 
is reached and install for example a virtual dispatch or whatever else. 
installCache would the cache logic as target of the callsite. 
internalFallback would increase the count as well as calling either 
fallback or megamorphicFallback. I didn't want to let the cache extend 
the callsite, since others may need the callsite class for their own 
purposes. Separation could allow for for example using a different cache 
type.


Taking your code from 
https://code.google.com/p/jsr292-cookbook/source/browse/trunk/inlining-cache/src/jsr292/cookbook/icache/RT.java 
basically line 42 would be in internalFallback 43-45 in 
megamorphicFallback, 46-47 in internalFallback, 50-57 in fallback, 56-57 
would be the test given to insert, 59-63 in internalFallback, and of 
course we would not link to fallback, but to internalFallback


What do you guys think?

bye Jochen

--
Jochen blackdrag Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


RE: A simple PIC api

2015-03-11 Thread Marrows, George A (GE Energy Management)
 so the order of the handles is never changed to for example trying the last 
 one first or the one with the most hits recently. Is it not worth the trouble?

We found a useful gain at one point from reordering GWT chains so that the new 
MH was put on the end of the chain and the chain as a whole was therefore 
sorted chronologically, with first seen at the top. That obviously requires a 
bit of book-keeping to arrange. We haven't experimented with maintaining our 
own hit counts, and in fact I think we last checked the perf gain from 
reversing the chain on Java 7, so results may no longer be relevant.



-Original Message-
From: mlvm-dev [mailto:mlvm-dev-boun...@openjdk.java.net] On Behalf Of Jochen 
Theodorou
Sent: 11 March 2015 14:00
To: Da Vinci Machine Project
Subject: Re: A simple PIC api

Am 11.03.2015 14:46, schrieb Jochen Theodorou:
[...]
 I really should write a PIC implementation for Groovy :(

actually picking up on that maybe you guys could give some advice about how 
to structure a PIC internally.

https://code.google.com/p/jsr292-cookbook/source/browse/trunk/inlining-cache/src/jsr292/cookbook/icache/RT.java

uses basically guardWithTest, so the order of the handles is never changed to 
for example trying the last one first or the one with the most hits recently. 
Is it not worth the trouble?

bye Jochen

--
Jochen blackdrag Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming 
sources visit http://groovy-lang.org

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Remi Forax


On 03/11/2015 11:12 PM, Mark Roos wrote:

From Jochen

Do I also understand right, that your test for checking if the current
target is still valid is limited to only the receiver?

Well yes and no.  In my case the test examines all of the arguments on 
the stack and computes
an 'behavior' reference.  This reference is the head of a linked list 
of method dictionaries which
the fallback searches to locate the code for this callsite and that 
behavior.  The yes part is that
in my Smalltalk the behavior is stored in a field of the object on the 
top of the stack.  This happens

to be the receiver.

For the proposed api the test can examine the entire stack and 
callsite to determine the dispatch.
I do plan to support multi methods which could be an expensive test. 
 I just wanted to do it once

regardless of the number of guards.

I also do special
   things to not add guards if I can, since each guard costs... 
though the

   information I use there is static so far

I would expect in a perfect world for the guards to be cheap if the 
jvm knew the were part of

a pic.


Do you mean you plan to use something like this ?
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.37.594rep=rep1type=pdf
the result of the dispatch algorithm will be either a method handle 
corresponding to the implementation to call.


If you want efficiency, you still have the issue that to be efficient 
you need to inline the code of all method handles inside the code of the 
caller and you can not do that if there are too many possible 
implementations i.e. if the callsite is megamorphic.


Currently, if a callsite is monomorphic and you have a way to prove it 
upfront, you can use a SwitchPoint with no guard at all (think something 
like there is only one class that implement this interface).
If you have a polymorphic callsite, guards are your savior, it's work 
pretty well until you have too many guards (it's a linear scan after all).
If you have a megamorphic callsite, you can use an exactInvoker + a 
fold, but you basically give up on inlining.
The other solution is to ask the VM to do a tableswitch on the method 
handle value (or on an int that represents the method handle value). We 
have no specific method handle for this case so you will have to 
generate bytecode.


So let me try to understand what you want, you want a combiner that will 
do an inlining cache or a polymorphic inlining cache with guards if 
there are few possible targets, this combiner will be able re-arrange 
the guard checks because the user guarantee that the check do no side 
effect. And if there are too many guards, the combiner will use a 
tableswitch to still try to inline targets at callsite until there are 
so many possible targets that the cominer will in that case only use an 
exactInvoker + a fold.


Currently, this can not be implemented easily for two reasons,
- you may not want profile counters that profile the different branches 
to be included in the JITed code, so you need a way to specify a special 
method handle that will execute a method handle in the interpreter but 
that will not appear in JITed code.
- you want the VM to be able to generate a tableswitch which is not 
something the VM is able to do now.


how far am I from what you are thinking ?



regards
mark


cheers,
Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Mark Roos
Hi Jochen
uses basically guardWithTest, so the order of the handles is never 
changed to for example trying the last one first or the one with the 
most hits recently. Is it not worth the trouble?

In my case I always add the new gwt to the head of the chain.  This fits 
my use case where
the most recent tends to be the most common at the moment.  It was also 
very easy to code
using method handles directly.  Not sure how you would do it in java. Also 
when it gets too
deep I just drop the chain and start over.

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Remi Forax


On 03/11/2015 11:12 PM, Mark Roos wrote:

Remi commented

I think you can adapt this code to implement what Mark want quite easily

I don't disagree that pics are easy to code, my premise is that with a 
construct such I
I proposed the jvm would do a better job of optimizing.  Especially 
taking into account

invalidation, multi core memory model and volatile state.


You have to be more specific, I don't understand exactly what you mean ?



If you don't think the performance gains would be worth the effort 
then I will defer

to you and beat some other drum.


:)



regards
mark


cheers,
Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Mark Roos
From Jochen

Do I also understand right, that your test for checking if the current 
target is still valid is limited to only the receiver?

Well yes and no.  In my case the test examines all of the arguments on the 
stack and computes
an 'behavior' reference.  This reference is the head of a linked list of 
method dictionaries which
the fallback searches to locate the code for this callsite and that 
behavior.  The yes part is that
in my Smalltalk the behavior is stored in a field of the object on the top 
of the stack.  This happens
to be the receiver.

For the proposed api the test can examine the entire stack and callsite to 
determine the dispatch. 
I do plan to support multi methods which could be an expensive test.  I 
just wanted to do it once 
regardless of the number of guards.

I also do special 
things to not add guards if I can, since each guard costs... 
though the 
information I use there is static so far

I would expect in a perfect world for the guards to be cheap if the jvm 
knew the were part of
a pic.

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Mark Roos
Remi commented

I think you can adapt this code to implement what Mark want quite easily

I don't disagree that pics are easy to code, my premise is that with a 
construct such I
I proposed the jvm would do a better job of optimizing.  Especially taking 
into account
invalidation, multi core memory model and volatile state. 

If you don't think the performance gains would be worth the effort then I 
will defer
to you and beat some other drum.

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Julien Ponge
The overhead is negligible in my micro-benchmarks for monomorphic and
trimorphic call sites compared to regular Java virtual calls.

- Julien 

On Tue, Mar 10, 2015, at 18:25, Mark Roos wrote:
 From Julian
 
  That being said performance of the PIC construct is very good in our 
 case.
 
 
 Do you have any quantitative data on your call site performance vs number 
 of targets?
 
 thx
 
 mark
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Mark Roos
Hi Rémi,

I assume you want me to be more specific about my concerns on:
taking into account invalidation, multi core memory model and 
volatile state. 

My model is that I have a GWT chain, a cache, and  a fallback which is 
updating the chain, and more
than one core using the same callsite.

So I have shared mutable state in the callsite holding things like depth, 
a list of receivers and the
current target.  It is likely that the two threads could be setting 
different targets etc.  So I worry
about syncing access.  But that seems like it could be expensive even 
though the possibility of
a collision would be rare.  Seems like a place for STM :).  I currently 
lock the callsites mutable parts
but I am not sure if I can lock the entire site or if I even should. 
Obviously most to the time the
site can be shared with no issues.  Perhaps its enough to lock the 
fallback?

The volatile state question is about the sharing of the call sites across 
threads.  It really ok if the
two threads create different caches as the code executed is always 
correct.  But when I change code
on the fly the caches all need to update so I need to make the cache 
volatile.  I can do this now with
a switch point so its not a big deal.

My invalidation concern is that as I update my chain that the jvm decides 
that I no longer deserve to
be inlined.  Especially when I drop the entire chain to rebuild it many 
times.

As I use object references for my test comparison I wonder if as they need 
to be tracked by the
GC does this add overhead.

I hope these are invalid concerns but if they were not I would think that 
letting the jvm know this was a pic
would allow it to handle them with a minimum loss in performance.

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Mark Roos
Hi Jochen

I have to think about yours some more but I thought I would share mine.  I 
condensed it to 
make it easier to explain. 

regards
mark

I extended callsite to hold a bunch of values one of which is the depth.
And my model for the cache is up to 10 chained GWTs after that I drop the
chain and start over.

When I get a miss in the cache off to the fallback ( look below to see how 
this MH is made)
static RtObject fallbackSelf(RtCallSite site, Object... args)

For each guard I create a test MH which takes all of the args and returns 
a bool.
Since I only look at the TOS it drops the rest,  yours is more 
complicated.  This can be
different for each guard in my case.
RtObject rcvr=(RtObject)args[args.length - 1];  // top of stack
test=MethodHandles.insertArguments(test, 0, rcvr.classField()); // 
the behavior field
   test=MethodHandles.dropArguments(test, 0, site.dropArray()); // 
match test signature

  then finds the code based on the site and the receiver in this case
MethodHandle newTarget=rcvr.lookupSelf(site.getSelector());

  makes a new GWT with the existing chain as its slow path
  if the cache is full will make the bootstrap MH (see below) the target 
dropping the existing cache
if(site._depth  10)  
MethodHandle gwt=MethodHandles.guardWithTest(test, newTarget, site
.getRealTarget());
site._depth = site._depth + 1;  // bump the depth

  inserts it at the start of the chain by making it the new target
  site.setRealTarget(gwt);

  and invokes it ( has to spread the args here as they were compressed on 
the way to the fallback)
MethodHandle invoke=newTarget.asSpreader(Object[].class, site
.getAirity());
  result=(RtObject) invoke.invokeExact((Object[])args);

You probably need to see the bootstrap to understand the invoke.  To limit 
myself to only one fallback signature
I bind the callsite and collect the args

 MethodHandle mh=lookup.findStatic(RtCallSite.class, 
fallbackSelf, mt); 
MethodHandle mh2=MethodHandles.insertArguments(mh, 0, site); // 
insert the call site
MethodHandle mh3=mh2.asCollector(RtObject[].class, site
.getAirity()); // combine the args into an array
site.setBootstrapTarget(mh3);___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Jochen Theodorou

Am 09.03.2015 18:51, schrieb Mark Roos:

I was thinking about a generic pic, easy to use but flexible and came up
with the following concept api.
By passing the callsite and the testValue around with allowing an
optional pic update I can envision
this as a nice building block for several caching strategies.  Of course
this is all based on my use
model which does not use primitives.  In summary it reduces the calling
of the test method to only
once and it supports the quick testing of a few targets.  Its also easy
to understand how to use without
using Charlies or Remis builders.


Do I also understand right, that your test for checking if the current 
target is still valid is limited to only the receiver?



to generate a PIC
 MethodHandle createPic(Callsite site, MethodHandle
getTestValue, MethodHandle fallback)

The getTestValue methodHandle returns the object used in the comparison
process
 Object getTestValue(Callsite site, Object... stackArgs)

The fallback returns a testValue and MH to use
 Object[] fallback(Callsite site, Object testValue, Object...
stackArgs)

 where the return value is [Object newTestValue][MethodHandle
newMH]  // use case for Tuple
   if(newTestValue != null) lookup.insert(newTest, newMH)
   newMh.invoke()
 This path should also be considered fast as the first step of a
fallback could be to check a cache of MH.

the lookup process could be a simple chain of compare/branch operations.
  And the mh replacement could be
a simple circular update.  This chain could be limited to less than 5.


I really should write a PIC implementation for Groovy :( Though it won't 
be as simple as this one. I need to check for arguments which can be 
primitives) and receiver as well as meta class (switchpoint for that 
one) So not only do I have to check more things, I also do special 
things to not add guards if I can, since each guard costs... though the 
information I use there is static so far.


bye Jochen

--
Jochen blackdrag Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-11 Thread Jochen Theodorou

Am 11.03.2015 14:46, schrieb Jochen Theodorou:
[...]

I really should write a PIC implementation for Groovy :(


actually picking up on that maybe you guys could give some advice 
about how to structure a PIC internally.


https://code.google.com/p/jsr292-cookbook/source/browse/trunk/inlining-cache/src/jsr292/cookbook/icache/RT.java

uses basically guardWithTest, so the order of the handles is never 
changed to for example trying the last one first or the one with the 
most hits recently. Is it not worth the trouble?


bye Jochen

--
Jochen blackdrag Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-10 Thread Julien Ponge
How is it different from Rémi's construct?

https://code.google.com/p/jsr292-cookbook/source/browse/trunk/inlining-cache/src/jsr292/cookbook/icache/RT.java

(we use that pattern in Golo)

- Julien

 On 09 Mar 2015, at 18:51, Mark Roos mr...@roos.com wrote:
 
 I was thinking about a generic pic, easy to use but flexible and came up with 
 the following concept api. 
 By passing the callsite and the testValue around with allowing an optional 
 pic update I can envision 
 this as a nice building block for several caching strategies.  Of course this 
 is all based on my use 
 model which does not use primitives.  In summary it reduces the calling of 
 the test method to only 
 once and it supports the quick testing of a few targets.  Its also easy to 
 understand how to use without 
 using Charlies or Remis builders. 
 
 
 to generate a PIC 
 MethodHandle createPic(Callsite site, MethodHandle getTestValue, 
 MethodHandle fallback) 
 
 The getTestValue methodHandle returns the object used in the comparison 
 process 
 Object getTestValue(Callsite site, Object... stackArgs) 
 
 The fallback returns a testValue and MH to use 
 Object[] fallback(Callsite site, Object testValue, Object... 
 stackArgs) 
 
 where the return value is [Object newTestValue][MethodHandle newMH]  
 // use case for Tuple 
 if(newTestValue != null) lookup.insert(newTest, newMH) 
 newMh.invoke() 
 This path should also be considered fast as the first step of a 
 fallback could be to check a cache of MH. 
 
 the lookup process could be a simple chain of compare/branch operations.  And 
 the mh replacement could be 
 a simple circular update.  This chain could be limited to less than 5. 
 
 regards 
 mark 
 
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-10 Thread Mark Roos
From Julian
   How is it different from Rémi's construct?

Performance would be the hope.

My position has been that with a decent pic api the jvm would be able to 
optimize the
pic to a few test/branch instructions for the large majority of callsites.

For your use case do you have the same situation as I do with 99%+  of 
call sites having
less than 5 targets?

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: A simple PIC api

2015-03-10 Thread Julien Ponge
 For your use case do you have the same situation as I do with 99%+  of call 
 sites having 
 less than 5 targets? 

Yes.

We set the PIC-megamorphic threshold to 5 in Golo, in which case we degrade 
to a stable but slower map-based + invoker dispatch.

That being said performance of the PIC construct is very good in our case.

- Julien
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev