Re: Getting back into indy...need a better argument collector!

2021-04-02 Thread Charles Oliver Nutter
First attempt at a workaround seems to be a wash. I rolled back to my
older logic (that does not use a hand-crafted collector method) to
come up with a pure-MethodHandle workaround for asCollector. I came up
with this (using InvokeBinder):

```
MethodHandle constructArray = Binder.from(arrayType, Object[].class)
.fold(MethodHandles.arrayLength(Object[].class))
.dropLast()
.newArray();

MethodHandle transmuteArray = Binder.from(arrayType, Object[].class)
.fold(constructArray)
.appendInts(0, 0, count)
.permute(1, 2, 0, 3, 4)
.cast(ARRAYCOPY.type().changeReturnType(arrayType))
.fold(ARRAYCOPY)
.permute(2)
.cast(arrayType, arrayType)
.identity();

MethodHandle collector = transmuteArray.asCollector(Object[].class,
count).asType(source.dropParameterTypes(0,
index).changeReturnType(arrayType));

return MethodHandles.collectArguments(target, index, collector);
```

Hopefully this is mostly readable. Basically I craft a chain of
handles that uses the normal Object[] collector and then simulates
what the pre-Jorn asCollector does: allocate the actual array we want
and arraycopy everything over. I figured this would be worth a try
since Jorn's comments on the PR hinted at the intermediate Object[]
going away for some collect forms. Unfortunately, reproducing the old
asCollector using MethodHandles does not appear to work any better...
or at least it still pales compared to a collector function.

I am open to suggestions because my next attempt will probably be to
chain a series of folds together that populate the target array
directly, but it will be array.length deep. Not ideal and not a good
general solution.

On Thu, Apr 1, 2021 at 6:44 PM Charles Oliver Nutter
 wrote:
>
> Very nice! I will have a look at the pull request and perhaps it will lead me 
> to a short-term work around as well.
>
> On Thu, Apr 1, 2021, 12:04 Jorn Vernee  wrote:
>>
>> Hi Charlie,
>>
>> (Sorry for replying out of line like this, but I'm not currently
>> subscribed to the mlvm-dev mailing list, so I could not reply to your
>> earlier email thread directly.)
>>
>> I have fixed the performance issue with asCollector you reported [1],
>> and with the patch the performance should be the same/similar for any
>> array type (as well as fixing a related issue with collectors that take
>> more than 10 arguments). The patch is out for review here:
>> https://github.com/openjdk/jdk/pull/3306
>>
>> Cheers,
>> Jorn
>>
>> [1] : https://bugs.openjdk.java.net/browse/JDK-8264288
>>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
https://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Getting back into indy...need a better argument collector!

2021-04-01 Thread Charles Oliver Nutter
Very nice! I will have a look at the pull request and perhaps it will lead
me to a short-term work around as well.

On Thu, Apr 1, 2021, 12:04 Jorn Vernee  wrote:

> Hi Charlie,
>
> (Sorry for replying out of line like this, but I'm not currently
> subscribed to the mlvm-dev mailing list, so I could not reply to your
> earlier email thread directly.)
>
> I have fixed the performance issue with asCollector you reported [1],
> and with the patch the performance should be the same/similar for any
> array type (as well as fixing a related issue with collectors that take
> more than 10 arguments). The patch is out for review here:
> https://github.com/openjdk/jdk/pull/3306
>
> Cheers,
> Jorn
>
> [1] : https://bugs.openjdk.java.net/browse/JDK-8264288
>
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
https://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Getting back into indy...need a better argument collector!

2021-03-26 Thread Charles Oliver Nutter
Thanks Paul! I am moving forward with my JRuby PRs but if I can help
in any way let me know.

I am especially interested in whether there might be some workaround
rather than having to write my own custom argument boxing collectors.
Will try to poke around at other combinations of handles and see what
I can come up with.

- Charlie

On Fri, Mar 26, 2021 at 11:20 AM Paul Sandoz  wrote:
>
> Hi Charlie,
>
> Thanks for the details. I quickly logged:
>
>   https://bugs.openjdk.java.net/browse/JDK-8264288
>
> I don’t have time to dive into the details right now. Perhaps next week, or 
> hopefully someone else can.
>
> Paul.
>
> > On Mar 25, 2021, at 9:25 PM, Charles Oliver Nutter  
> > wrote:
> >
> > JRuby branch with changes to use our own collector methods:
> > https://github.com/jruby/jruby/pull/6630
> >
> > InvokeBinder 1.2 added collect(index, type, collector) that calls
> > MethodHandles.collectArguments:
> > https://github.com/headius/invokebinder/commit/9650de07715c6e15a8ca4029c40ea5ede9d5c4c9
> >
> > A build of JRuby from the branch (or from jruby-9.2 branch or master
> > once it is merged) compared with JRuby 9.2.16.0 should show the issue.
> > Benchmark included in the PR above.
> >
> > On Thu, Mar 25, 2021 at 8:43 PM Charles Oliver Nutter
> >  wrote:
> >>
> >> After experimenting with MethodHandles.collectArguments (given a
> >> hand-written collector function) versus my own logic (using folds and
> >> permutes to call my collector), I can confirm that both are roughly
> >> equivalent and better than MethodHandle.asCollector.
> >>
> >> The benchmark linked below calls a lightweight core Ruby method
> >> (Array#dig) that only accepts an IRubyObject[] (so all arities must
> >> box). The performance of collectArguments is substantially better than
> >> asCollector.
> >>
> >> https://gist.github.com/headius/28343b8c393e76c717314af57089848d
> >>
> >> I do not believe this should be so. The logic for asCollector should
> >> be able to gather up Object subtypes into an Object[] subtype without
> >> an intermediate array or extra copying.
> >>
> >> On Thu, Mar 25, 2021 at 7:39 PM Charles Oliver Nutter
> >>  wrote:
> >>>
> >>> Well it only took me five years to circle back to this but I can
> >>> confirm it is just as bad now as it ever was. And it is definitely due
> >>> to collecting a single type.
> >>>
> >>> I will provide whatever folks need to investigate but it is pretty
> >>> straightforward. When asking for asCollector of a non-Object[] type,
> >>> the implementation will first gather arguments into an Object[], and
> >>> then create a copy of that array as the correct type. So two arrays
> >>> are created, values are copied twice.
> >>>
> >>> I can see this quite clearly in the assembly after letting things
> >>> optimize. A new Object[] is created and populated, and then a second
> >>> array of the correct type is created followed by an arraycopy
> >>> operation.
> >>>
> >>> I am once again backing off using asCollector directly to instead
> >>> provide my own array-construction collector.
> >>>
> >>> Should be easy to reproduce the perf issues simply by doing an
> >>> asCollector that results in some subtype of Object[].
> >>>
> >>> On Thu, Jan 14, 2016 at 8:18 PM Charles Oliver Nutter
> >>>  wrote:
> >>>>
> >>>> Thanks Duncan. I will try to look under the covers this evening.
> >>>>
> >>>> - Charlie (mobile)
> >>>>
> >>>> On Jan 14, 2016 14:39, "MacGregor, Duncan (GE Energy Management)" 
> >>>>  wrote:
> >>>>>
> >>>>> On 11/01/2016, 11:27, "mlvm-dev on behalf of MacGregor, Duncan (GE 
> >>>>> Energy
> >>>>> Management)"  >>>>> duncan.macgre...@ge.com> wrote:
> >>>>>
> >>>>>> On 11/01/2016, 03:16, "mlvm-dev on behalf of Charles Oliver Nutter"
> >>>>>> 
> >>>>>> wrote:
> >>>>>> ...
> >>>>>>> With asCollector: 16-17s per iteration
> >>>>>>>
> >>>>>>> With hand-written array construction: 7-8s per iteration
> >>>>>>>
> >>>>>>> A sampling profile only shows my Ruby code as the

Re: Getting back into indy...need a better argument collector!

2021-03-25 Thread Charles Oliver Nutter
JRuby branch with changes to use our own collector methods:
https://github.com/jruby/jruby/pull/6630

InvokeBinder 1.2 added collect(index, type, collector) that calls
MethodHandles.collectArguments:
https://github.com/headius/invokebinder/commit/9650de07715c6e15a8ca4029c40ea5ede9d5c4c9

A build of JRuby from the branch (or from jruby-9.2 branch or master
once it is merged) compared with JRuby 9.2.16.0 should show the issue.
Benchmark included in the PR above.

On Thu, Mar 25, 2021 at 8:43 PM Charles Oliver Nutter
 wrote:
>
> After experimenting with MethodHandles.collectArguments (given a
> hand-written collector function) versus my own logic (using folds and
> permutes to call my collector), I can confirm that both are roughly
> equivalent and better than MethodHandle.asCollector.
>
> The benchmark linked below calls a lightweight core Ruby method
> (Array#dig) that only accepts an IRubyObject[] (so all arities must
> box). The performance of collectArguments is substantially better than
> asCollector.
>
> https://gist.github.com/headius/28343b8c393e76c717314af57089848d
>
> I do not believe this should be so. The logic for asCollector should
> be able to gather up Object subtypes into an Object[] subtype without
> an intermediate array or extra copying.
>
> On Thu, Mar 25, 2021 at 7:39 PM Charles Oliver Nutter
>  wrote:
> >
> > Well it only took me five years to circle back to this but I can
> > confirm it is just as bad now as it ever was. And it is definitely due
> > to collecting a single type.
> >
> > I will provide whatever folks need to investigate but it is pretty
> > straightforward. When asking for asCollector of a non-Object[] type,
> > the implementation will first gather arguments into an Object[], and
> > then create a copy of that array as the correct type. So two arrays
> > are created, values are copied twice.
> >
> > I can see this quite clearly in the assembly after letting things
> > optimize. A new Object[] is created and populated, and then a second
> > array of the correct type is created followed by an arraycopy
> > operation.
> >
> > I am once again backing off using asCollector directly to instead
> > provide my own array-construction collector.
> >
> > Should be easy to reproduce the perf issues simply by doing an
> > asCollector that results in some subtype of Object[].
> >
> > On Thu, Jan 14, 2016 at 8:18 PM Charles Oliver Nutter
> >  wrote:
> > >
> > > Thanks Duncan. I will try to look under the covers this evening.
> > >
> > > - Charlie (mobile)
> > >
> > > On Jan 14, 2016 14:39, "MacGregor, Duncan (GE Energy Management)" 
> > >  wrote:
> > >>
> > >> On 11/01/2016, 11:27, "mlvm-dev on behalf of MacGregor, Duncan (GE Energy
> > >> Management)"  > >> duncan.macgre...@ge.com> wrote:
> > >>
> > >> >On 11/01/2016, 03:16, "mlvm-dev on behalf of Charles Oliver Nutter"
> > >> >
> > >> >wrote:
> > >> >...
> > >> >>With asCollector: 16-17s per iteration
> > >> >>
> > >> >>With hand-written array construction: 7-8s per iteration
> > >> >>
> > >> >>A sampling profile only shows my Ruby code as the top items, and an
> > >> >>allocation trace shows Object[] as the number one object being
> > >> >>created...not IRubyObject[]. Could that be the reason it's slower?
> > >> >>Some type trickery messing with optimization?
> > >> >>
> > >> >>This is very unfortunate because there's no other general-purpose way
> > >> >>to collect arguments in a handle chain.
> > >> >
> > >> >I haven¹t done any comparative benchmarks in that area for a while, but
> > >> >collecting a single argument is a pretty common pattern in the Magik 
> > >> >code,
> > >> >and I had not seen any substantial difference when we last touched that
> > >> >area. However we are collecting to plain Object[] so it might be that is
> > >> >the reason for the difference. If I¹ve got time later this week I¹ll do
> > >> >some experimenting and check what the current situation is.
> > >>
> > >> Okay, I’ve now had a chance to try this in with our language benchmarks
> > >> and can’t see any significant difference between a hand crafted method 
> > >> and
> > >> asCOllector, but we are dealing with Object and Object[], so it might be
> > >> something to do with additional casting.
> > >>
> > >> Duncan.
> > >>
> > >> ___
> > >> mlvm-dev mailing list
> > >> mlvm-dev@openjdk.java.net
> > >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
https://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Getting back into indy...need a better argument collector!

2021-03-25 Thread Charles Oliver Nutter
After experimenting with MethodHandles.collectArguments (given a
hand-written collector function) versus my own logic (using folds and
permutes to call my collector), I can confirm that both are roughly
equivalent and better than MethodHandle.asCollector.

The benchmark linked below calls a lightweight core Ruby method
(Array#dig) that only accepts an IRubyObject[] (so all arities must
box). The performance of collectArguments is substantially better than
asCollector.

https://gist.github.com/headius/28343b8c393e76c717314af57089848d

I do not believe this should be so. The logic for asCollector should
be able to gather up Object subtypes into an Object[] subtype without
an intermediate array or extra copying.

On Thu, Mar 25, 2021 at 7:39 PM Charles Oliver Nutter
 wrote:
>
> Well it only took me five years to circle back to this but I can
> confirm it is just as bad now as it ever was. And it is definitely due
> to collecting a single type.
>
> I will provide whatever folks need to investigate but it is pretty
> straightforward. When asking for asCollector of a non-Object[] type,
> the implementation will first gather arguments into an Object[], and
> then create a copy of that array as the correct type. So two arrays
> are created, values are copied twice.
>
> I can see this quite clearly in the assembly after letting things
> optimize. A new Object[] is created and populated, and then a second
> array of the correct type is created followed by an arraycopy
> operation.
>
> I am once again backing off using asCollector directly to instead
> provide my own array-construction collector.
>
> Should be easy to reproduce the perf issues simply by doing an
> asCollector that results in some subtype of Object[].
>
> On Thu, Jan 14, 2016 at 8:18 PM Charles Oliver Nutter
>  wrote:
> >
> > Thanks Duncan. I will try to look under the covers this evening.
> >
> > - Charlie (mobile)
> >
> > On Jan 14, 2016 14:39, "MacGregor, Duncan (GE Energy Management)" 
> >  wrote:
> >>
> >> On 11/01/2016, 11:27, "mlvm-dev on behalf of MacGregor, Duncan (GE Energy
> >> Management)"  >> duncan.macgre...@ge.com> wrote:
> >>
> >> >On 11/01/2016, 03:16, "mlvm-dev on behalf of Charles Oliver Nutter"
> >> >
> >> >wrote:
> >> >...
> >> >>With asCollector: 16-17s per iteration
> >> >>
> >> >>With hand-written array construction: 7-8s per iteration
> >> >>
> >> >>A sampling profile only shows my Ruby code as the top items, and an
> >> >>allocation trace shows Object[] as the number one object being
> >> >>created...not IRubyObject[]. Could that be the reason it's slower?
> >> >>Some type trickery messing with optimization?
> >> >>
> >> >>This is very unfortunate because there's no other general-purpose way
> >> >>to collect arguments in a handle chain.
> >> >
> >> >I haven¹t done any comparative benchmarks in that area for a while, but
> >> >collecting a single argument is a pretty common pattern in the Magik code,
> >> >and I had not seen any substantial difference when we last touched that
> >> >area. However we are collecting to plain Object[] so it might be that is
> >> >the reason for the difference. If I¹ve got time later this week I¹ll do
> >> >some experimenting and check what the current situation is.
> >>
> >> Okay, I’ve now had a chance to try this in with our language benchmarks
> >> and can’t see any significant difference between a hand crafted method and
> >> asCOllector, but we are dealing with Object and Object[], so it might be
> >> something to do with additional casting.
> >>
> >> Duncan.
> >>
> >> ___
> >> mlvm-dev mailing list
> >> mlvm-dev@openjdk.java.net
> >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
https://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Getting back into indy...need a better argument collector!

2021-03-25 Thread Charles Oliver Nutter
Well it only took me five years to circle back to this but I can
confirm it is just as bad now as it ever was. And it is definitely due
to collecting a single type.

I will provide whatever folks need to investigate but it is pretty
straightforward. When asking for asCollector of a non-Object[] type,
the implementation will first gather arguments into an Object[], and
then create a copy of that array as the correct type. So two arrays
are created, values are copied twice.

I can see this quite clearly in the assembly after letting things
optimize. A new Object[] is created and populated, and then a second
array of the correct type is created followed by an arraycopy
operation.

I am once again backing off using asCollector directly to instead
provide my own array-construction collector.

Should be easy to reproduce the perf issues simply by doing an
asCollector that results in some subtype of Object[].

On Thu, Jan 14, 2016 at 8:18 PM Charles Oliver Nutter
 wrote:
>
> Thanks Duncan. I will try to look under the covers this evening.
>
> - Charlie (mobile)
>
> On Jan 14, 2016 14:39, "MacGregor, Duncan (GE Energy Management)" 
>  wrote:
>>
>> On 11/01/2016, 11:27, "mlvm-dev on behalf of MacGregor, Duncan (GE Energy
>> Management)" > duncan.macgre...@ge.com> wrote:
>>
>> >On 11/01/2016, 03:16, "mlvm-dev on behalf of Charles Oliver Nutter"
>> >
>> >wrote:
>> >...
>> >>With asCollector: 16-17s per iteration
>> >>
>> >>With hand-written array construction: 7-8s per iteration
>> >>
>> >>A sampling profile only shows my Ruby code as the top items, and an
>> >>allocation trace shows Object[] as the number one object being
>> >>created...not IRubyObject[]. Could that be the reason it's slower?
>> >>Some type trickery messing with optimization?
>> >>
>> >>This is very unfortunate because there's no other general-purpose way
>> >>to collect arguments in a handle chain.
>> >
>> >I haven¹t done any comparative benchmarks in that area for a while, but
>> >collecting a single argument is a pretty common pattern in the Magik code,
>> >and I had not seen any substantial difference when we last touched that
>> >area. However we are collecting to plain Object[] so it might be that is
>> >the reason for the difference. If I¹ve got time later this week I¹ll do
>> >some experimenting and check what the current situation is.
>>
>> Okay, I’ve now had a chance to try this in with our language benchmarks
>> and can’t see any significant difference between a hand crafted method and
>> asCOllector, but we are dealing with Object and Object[], so it might be
>> something to do with additional casting.
>>
>> Duncan.
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
https://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: NoClassDefFoundError using LMF against a generated class's handle

2018-06-29 Thread Charles Oliver Nutter
To help illustrate a bit, here's a snippit of the code to create the
allocator. It succeeds, but the allocator later throws NoClassDefFoundError.

https://gist.github.com/headius/cce750221cf73df76cb7f7ce92c1a759

- Charlie

On Fri, Jun 29, 2018 at 8:00 PM, Charles Oliver Nutter 
wrote:

> Hello folks!
>
> I'm improving JRuby's support for instance variables-as-fields, which
> involves generating a new JVM class with a field per instance variable in
> the Ruby class.
>
> The construction process for these classes involves an implementation of
> my "ObjectAllocator" interface, which is stored with the Ruby class.
>
> Previously, the generated classes also included a generated child class
> that implenented ObjectAllocator appropriately. I was hoping to use
> LambdaMetafactory to avoid generating that class, but I'm running into a
> problem.
>
> Say we have a Ruby class with three instance variables. JRuby will
> generate a "RubyObject3" class that holds those variables in their own
> fields var0, var1, and var2. The process leading up to the bug goes like
> this:
>
> * Generate the RubyObject3 class, in its own classloader that's a child of
> the current one.
> * Acquire a constructor handle for that class.
> * Use that constructor with LambdaMetafactory.metafactory to produce an
> allocator-creating call site.
> * Invoke that call site to get the one allocator instance we need.
>
> Note that since the metafactory call requires a Lookup, I am providing it
> one from the parent classloader.
>
> I am able to get through this process without error. However, when I
> finally invoke the allocator, I get a NoClassDefFoundError and a stack
> trace that ends at the allocator call.
>
> So...what am I doing wrong?
>
> - Charlie
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


NoClassDefFoundError using LMF against a generated class's handle

2018-06-29 Thread Charles Oliver Nutter
Hello folks!

I'm improving JRuby's support for instance variables-as-fields, which
involves generating a new JVM class with a field per instance variable in
the Ruby class.

The construction process for these classes involves an implementation of my
"ObjectAllocator" interface, which is stored with the Ruby class.

Previously, the generated classes also included a generated child class
that implenented ObjectAllocator appropriately. I was hoping to use
LambdaMetafactory to avoid generating that class, but I'm running into a
problem.

Say we have a Ruby class with three instance variables. JRuby will generate
a "RubyObject3" class that holds those variables in their own fields var0,
var1, and var2. The process leading up to the bug goes like this:

* Generate the RubyObject3 class, in its own classloader that's a child of
the current one.
* Acquire a constructor handle for that class.
* Use that constructor with LambdaMetafactory.metafactory to produce an
allocator-creating call site.
* Invoke that call site to get the one allocator instance we need.

Note that since the metafactory call requires a Lookup, I am providing it
one from the parent classloader.

I am able to get through this process without error. However, when I
finally invoke the allocator, I get a NoClassDefFoundError and a stack
trace that ends at the allocator call.

So...what am I doing wrong?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: ClassValue rooting objects after it goes away?

2018-03-02 Thread Charles Oliver Nutter
Put it another way: does a static reference from a class to itself prevent
that class from being garbage collected? Of course not. ClassValue is
intended to be a way to inject pseudo-static data into either a class or a
Class. Injecting that data, even if it has a reference back to the class,
should not prevent the class from being collected.

On Fri, Mar 2, 2018 at 2:19 PM Charles Oliver Nutter 
wrote:

> I have posted a modified version of my description to the main bug report.
>
> TLDR: ClassValue should not root objects.
>
> - Charlie
>
> On Fri, Mar 2, 2018 at 2:13 PM Charles Oliver Nutter 
> wrote:
>
>> Yes, it may be the same bug.
>>
>> In my case, the ClassValue is held by a utility object used for our Java
>> integration. That utility object has to live somewhere, so it's held by the
>> JRuby runtime instance. There's a strong reference chain leading to the
>> ClassValue.
>>
>> The value is a Ruby representation of the class, with reflected methods
>> parsed out and turned into Ruby endpoints. Obviously, the value also
>> references the class, either directly or indirectly through reflected
>> members.
>>
>> The Ruby class wrapper is only hard referenced directly if there's an
>> instance of the object live and moving through JRuby. It may be referenced
>> indirectly through inline caches.
>>
>> However...I do not believe this should prevent collection of the class
>> associated with the ClassValue.
>>
>> The value referenced in the ClassValue should not constitute a hard
>> reference. If it is alive *only* because of its associate with a given
>> class, that should not be enough to root either the object or the class.
>>
>> ClassValue should work like ThreadLocal. If the Thread associated with a
>> value goes away, the value reference goes away. ThreadLocal does nothing
>> prevent it from being collected. If the Class associated with a Value goes
>> away, the same should happen to that Value and it should be collectable
>> once all other hard references are gone.
>>
>> Perhaps I've misunderstood?
>>
>> - Charlie
>>
>> On Fri, Mar 2, 2018 at 12:16 PM Vladimir Ivanov <
>> vladimir.x.iva...@oracle.com> wrote:
>>
>>> Charlie,
>>>
>>> Does it look similar to the following bugs?
>>>https://bugs.openjdk.java.net/browse/JDK-8136353
>>>https://bugs.openjdk.java.net/browse/JDK-8169425
>>>
>>> If that's the same (and it seems so to me [1]), then speak up and
>>> persuade Paul it's an important edge case (as stated in JDK-8169425).
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] new RubyClass(Ruby.this) in
>>>
>>>  public static class Ruby {
>>>  private ClassValue cache = new
>>> ClassValue() {
>>>  protected RubyClass computeValue(Class type) {
>>>  return new RubyClass(Ruby.this);
>>>  }
>>>  };
>>>
>>> On 3/1/18 2:25 AM, Charles Oliver Nutter wrote:
>>> > So I don't think we ever closed the loop here. Did anyone on the JDK
>>> > side confirm this, file an issue, or fix it?
>>> >
>>> > We still have ClassValue disabled in JRuby because of the rooting
>>> issues
>>> > described here and in https://github.com/jruby/jruby/pull/3228.
>>> >
>>> > - Charlie
>>> >
>>> > On Thu, Aug 27, 2015 at 7:04 AM Jochen Theodorou >> > <mailto:blackd...@gmx.org>> wrote:
>>> >
>>> > One more thing...
>>> >
>>> > Remi, I tried your link with my simplified scenario and it does
>>> there
>>> > not stop the collection of the classloader
>>> >
>>> > Am 27.08.2015 11:54, schrieb Jochen Theodorou:
>>> >  > Hi,
>>> >  >
>>> >  > In trying to reproduce the problem outside of Groovy I stumbled
>>> > over a
>>> >  > case case which I think should work
>>> >  >
>>> >  > public class MyClassValue extends ClassValue {
>>> >  >  protected Object computeValue(Class type) {
>>> >  >  Dummy ret = new Dummy();
>>> >  >  Dummy.l.add (this);
>>> >  >  return ret;
>>> >  >  }
>>> >  > }
>>> >  >
>>> >  >   class Dummy {
>>> >

Re: ClassValue rooting objects after it goes away?

2018-03-02 Thread Charles Oliver Nutter
I have posted a modified version of my description to the main bug report.

TLDR: ClassValue should not root objects.

- Charlie

On Fri, Mar 2, 2018 at 2:13 PM Charles Oliver Nutter 
wrote:

> Yes, it may be the same bug.
>
> In my case, the ClassValue is held by a utility object used for our Java
> integration. That utility object has to live somewhere, so it's held by the
> JRuby runtime instance. There's a strong reference chain leading to the
> ClassValue.
>
> The value is a Ruby representation of the class, with reflected methods
> parsed out and turned into Ruby endpoints. Obviously, the value also
> references the class, either directly or indirectly through reflected
> members.
>
> The Ruby class wrapper is only hard referenced directly if there's an
> instance of the object live and moving through JRuby. It may be referenced
> indirectly through inline caches.
>
> However...I do not believe this should prevent collection of the class
> associated with the ClassValue.
>
> The value referenced in the ClassValue should not constitute a hard
> reference. If it is alive *only* because of its associate with a given
> class, that should not be enough to root either the object or the class.
>
> ClassValue should work like ThreadLocal. If the Thread associated with a
> value goes away, the value reference goes away. ThreadLocal does nothing
> prevent it from being collected. If the Class associated with a Value goes
> away, the same should happen to that Value and it should be collectable
> once all other hard references are gone.
>
> Perhaps I've misunderstood?
>
> - Charlie
>
> On Fri, Mar 2, 2018 at 12:16 PM Vladimir Ivanov <
> vladimir.x.iva...@oracle.com> wrote:
>
>> Charlie,
>>
>> Does it look similar to the following bugs?
>>https://bugs.openjdk.java.net/browse/JDK-8136353
>>https://bugs.openjdk.java.net/browse/JDK-8169425
>>
>> If that's the same (and it seems so to me [1]), then speak up and
>> persuade Paul it's an important edge case (as stated in JDK-8169425).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] new RubyClass(Ruby.this) in
>>
>>  public static class Ruby {
>>      private ClassValue cache = new
>> ClassValue() {
>>  protected RubyClass computeValue(Class type) {
>>  return new RubyClass(Ruby.this);
>>  }
>>  };
>>
>> On 3/1/18 2:25 AM, Charles Oliver Nutter wrote:
>> > So I don't think we ever closed the loop here. Did anyone on the JDK
>> > side confirm this, file an issue, or fix it?
>> >
>> > We still have ClassValue disabled in JRuby because of the rooting issues
>> > described here and in https://github.com/jruby/jruby/pull/3228.
>> >
>> > - Charlie
>> >
>> > On Thu, Aug 27, 2015 at 7:04 AM Jochen Theodorou > > <mailto:blackd...@gmx.org>> wrote:
>> >
>> > One more thing...
>> >
>> > Remi, I tried your link with my simplified scenario and it does
>> there
>> > not stop the collection of the classloader
>> >
>> > Am 27.08.2015 11:54, schrieb Jochen Theodorou:
>> >  > Hi,
>> >  >
>> >  > In trying to reproduce the problem outside of Groovy I stumbled
>> > over a
>> >  > case case which I think should work
>> >  >
>> >  > public class MyClassValue extends ClassValue {
>> >  >  protected Object computeValue(Class type) {
>> >  >  Dummy ret = new Dummy();
>> >  >  Dummy.l.add (this);
>> >  >  return ret;
>> >  >  }
>> >  > }
>> >  >
>> >  >   class Dummy {
>> >  >   static final ArrayList l = new ArrayList();
>> >  >   }
>> >  >
>> >  > basically this means there will be a hard reference on the
>> ClassValue
>> >  > somewhere. It can be in a static or non-static field, direct or
>> >  > indirect. But this won't collect. If I put for example a
>> > WeakReference
>> >  > in between it works again.
>> >  >
>> >  > Finally I also tested to put the hard reference in a third class
>> >  > instead, to avoid this self reference. But it can still not
>> collect.
>> >  >
>> >  > So I currently have the impression that if anything holds a hard
>> >  > reference on th

Re: ClassValue rooting objects after it goes away?

2018-03-02 Thread Charles Oliver Nutter
Yes, it may be the same bug.

In my case, the ClassValue is held by a utility object used for our Java
integration. That utility object has to live somewhere, so it's held by the
JRuby runtime instance. There's a strong reference chain leading to the
ClassValue.

The value is a Ruby representation of the class, with reflected methods
parsed out and turned into Ruby endpoints. Obviously, the value also
references the class, either directly or indirectly through reflected
members.

The Ruby class wrapper is only hard referenced directly if there's an
instance of the object live and moving through JRuby. It may be referenced
indirectly through inline caches.

However...I do not believe this should prevent collection of the class
associated with the ClassValue.

The value referenced in the ClassValue should not constitute a hard
reference. If it is alive *only* because of its associate with a given
class, that should not be enough to root either the object or the class.

ClassValue should work like ThreadLocal. If the Thread associated with a
value goes away, the value reference goes away. ThreadLocal does nothing
prevent it from being collected. If the Class associated with a Value goes
away, the same should happen to that Value and it should be collectable
once all other hard references are gone.

Perhaps I've misunderstood?

- Charlie

On Fri, Mar 2, 2018 at 12:16 PM Vladimir Ivanov <
vladimir.x.iva...@oracle.com> wrote:

> Charlie,
>
> Does it look similar to the following bugs?
>https://bugs.openjdk.java.net/browse/JDK-8136353
>https://bugs.openjdk.java.net/browse/JDK-8169425
>
> If that's the same (and it seems so to me [1]), then speak up and
> persuade Paul it's an important edge case (as stated in JDK-8169425).
>
> Best regards,
> Vladimir Ivanov
>
> [1] new RubyClass(Ruby.this) in
>
>  public static class Ruby {
>  private ClassValue cache = new ClassValue()
> {
>  protected RubyClass computeValue(Class type) {
>  return new RubyClass(Ruby.this);
>  }
>  };
>
> On 3/1/18 2:25 AM, Charles Oliver Nutter wrote:
> > So I don't think we ever closed the loop here. Did anyone on the JDK
> > side confirm this, file an issue, or fix it?
> >
> > We still have ClassValue disabled in JRuby because of the rooting issues
> > described here and in https://github.com/jruby/jruby/pull/3228.
> >
> > - Charlie
> >
> > On Thu, Aug 27, 2015 at 7:04 AM Jochen Theodorou  > <mailto:blackd...@gmx.org>> wrote:
> >
> > One more thing...
> >
> > Remi, I tried your link with my simplified scenario and it does there
> > not stop the collection of the classloader
> >
> > Am 27.08.2015 11:54, schrieb Jochen Theodorou:
> >  > Hi,
> >  >
> >  > In trying to reproduce the problem outside of Groovy I stumbled
> > over a
> >  > case case which I think should work
> >  >
> >  > public class MyClassValue extends ClassValue {
> >  >  protected Object computeValue(Class type) {
> >  >  Dummy ret = new Dummy();
> >  >  Dummy.l.add (this);
> >  >  return ret;
> >  >  }
> >  > }
> >  >
> >  >   class Dummy {
> >  >   static final ArrayList l = new ArrayList();
> >  >   }
> >  >
> >  > basically this means there will be a hard reference on the
> ClassValue
> >  > somewhere. It can be in a static or non-static field, direct or
> >  > indirect. But this won't collect. If I put for example a
> > WeakReference
> >  > in between it works again.
> >  >
> >  > Finally I also tested to put the hard reference in a third class
> >  > instead, to avoid this self reference. But it can still not
> collect.
> >  >
> >  > So I currently have the impression that if anything holds a hard
> >  > reference on the class value that the classloader cannot be
> collected
> >  > anymore.
> >  >
> >  > Unless I misunderstand something here I see that as a bug
> >  >
> >  > bye blackdrag
> >  >
> >
> >
> > --
> > Jochen "blackdrag" Theodorou
> > blog: http://blackdragsview.blogspot.com/
> >
> > ___
> > mlvm-dev mailing list
> > mlvm-dev@openjdk.java.net <mailto:mlvm-dev@openjdk.java.net>
> > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> >
> > --
> >
> > - Charlie (mobile)
> >
> >
> >
> > ___
> > mlvm-dev mailing list
> > mlvm-dev@openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> >
>
-- 

- Charlie (mobile)
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Interface injection in an age of default interface methods

2018-02-28 Thread Charles Oliver Nutter
Here's an oldie but goodie: what ever happened to interface injection?

For those unfamiliar, we dynlang guys had an idea years ago that if we
could simply "force" an interface into an existing Java class, with a
handler dangling off the side, we could pass normal Java objects through
languages that have their own supertypes without needing a wrapper.

So in the case of JRuby, where every method signature and every local
variable is typed IRubyObject, we'd inject a default impl of IRubyObject
into java.lang.Object, and it would know how to handle all our dispatch
logic.

Back in the day, one of the sticky bits was wiring together the
implementation of all those interface methods. These days, perhaps that's
not a problem with default interface methods from Java 8?

Perhaps the JVM could (at some point) even allow you to cast an object to
*any* interface, so long as all that interface's methods had default or
natural implementations?

- Charlie
-- 

- Charlie (mobile)
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: ClassValue rooting objects after it goes away?

2018-02-28 Thread Charles Oliver Nutter
So I don't think we ever closed the loop here. Did anyone on the JDK side
confirm this, file an issue, or fix it?

We still have ClassValue disabled in JRuby because of the rooting issues
described here and in https://github.com/jruby/jruby/pull/3228.

- Charlie

On Thu, Aug 27, 2015 at 7:04 AM Jochen Theodorou  wrote:

> One more thing...
>
> Remi, I tried your link with my simplified scenario and it does there
> not stop the collection of the classloader
>
> Am 27.08.2015 11:54, schrieb Jochen Theodorou:
> > Hi,
> >
> > In trying to reproduce the problem outside of Groovy I stumbled over a
> > case case which I think should work
> >
> > public class MyClassValue extends ClassValue {
> >  protected Object computeValue(Class type) {
> >  Dummy ret = new Dummy();
> >  Dummy.l.add (this);
> >  return ret;
> >  }
> > }
> >
> >   class Dummy {
> >   static final ArrayList l = new ArrayList();
> >   }
> >
> > basically this means there will be a hard reference on the ClassValue
> > somewhere. It can be in a static or non-static field, direct or
> > indirect. But this won't collect. If I put for example a WeakReference
> > in between it works again.
> >
> > Finally I also tested to put the hard reference in a third class
> > instead, to avoid this self reference. But it can still not collect.
> >
> > So I currently have the impression that if anything holds a hard
> > reference on the class value that the classloader cannot be collected
> > anymore.
> >
> > Unless I misunderstand something here I see that as a bug
> >
> > bye blackdrag
> >
>
>
> --
> Jochen "blackdrag" Theodorou
> blog: http://blackdragsview.blogspot.com/
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
-- 

- Charlie (mobile)
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Error, Java 8, lambda form compilation

2018-02-28 Thread Charles Oliver Nutter
Ah-ha...I added some logging, which of course made the error go away...but
about ten tests later I got a metaspace OOM.

Could be this was all just a memory issue, but it would be nice if the
error didn't get swallowed.

- Charlie

On Wed, Feb 28, 2018 at 12:40 PM Charles Oliver Nutter 
wrote:

> Hey, I'm still not sure how best to deal with this, but we've been
> consistently getting a similar error at the same place. It has kept JRuby
> master CI red for many weeks.
>
> The problem does not reproduce when running in isolation...only in a long
> test run, and so far only on Travis CI (Ubuntu 16.something, Java 8u151).
>
> Looking at the code, it appears the dropArguments call below (called from
> MethodHandles.guardWithTest:3018) was replaced with some new code and
> dropArgumentsToMatch in 9. I have not read through logs to see if that
> change might be related.
>
> Unhandled Java exception: java.lang.InternalError: 
> exactInvoker=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
>  [exec] t3:L=BoundMethodHandle$Species_LL.argL1(a0:L);
>  [exec] t4:L=MethodHandle.invokeBasic(t3:L);
>  [exec] t5:L=BoundMethodHandle$Species_LL.argL0(a0:L);
>  [exec] t6:V=Invokers.checkExactType(t4:L,t5:L);
>  [exec] t7:V=Invokers.checkCustomized(t4:L);
>  [exec] t8:I=MethodHandle.invokeBasic(t4:L);t8:I}
>  [exec] java.lang.InternalError: 
> exactInvoker=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
>  [exec] t3:L=BoundMethodHandle$Species_LL.argL1(a0:L);
>  [exec] t4:L=MethodHandle.invokeBasic(t3:L);
>  [exec] t5:L=BoundMethodHandle$Species_LL.argL0(a0:L);
>  [exec] t6:V=Invokers.checkExactType(t4:L,t5:L);
>  [exec] t7:V=Invokers.checkCustomized(t4:L);
>  [exec] t8:I=MethodHandle.invokeBasic(t4:L);t8:I}
>  [exec]newInternalError at 
> java/lang/invoke/MethodHandleStatics.java:127
>  [exec]   compileToBytecode at java/lang/invoke/LambdaForm.java:660
>  [exec] prepare at java/lang/invoke/LambdaForm.java:635
>  [exec]   at java/lang/invoke/MethodHandle.java:461
>  [exec]   at java/lang/invoke/BoundMethodHandle.java:58
>  [exec]   at java/lang/invoke/Species_LL:-1
>  [exec]copyWith at java/lang/invoke/Species_LL:-1
>  [exec]   dropArguments at java/lang/invoke/MethodHandles.java:2465
>  [exec]   guardWithTest at java/lang/invoke/MethodHandles.java:3018
>  [exec]   guardWithTest at java/lang/invoke/SwitchPoint.java:173
>  [exec] searchConst at 
> org/jruby/ir/targets/ConstantLookupSite.java:103
>
>
> On Fri, Jan 12, 2018 at 9:54 AM Charles Oliver Nutter 
> wrote:
>
>> I wish I could provide more info here. Just got another one in CI:
>>
>>  [exec] [1603/8763] 
>> TestBenchmark#test_benchmark_makes_extra_calcultations_with_an_Array_at_the_end_of_the_benchmark_and_show_the_resultUnhandled
>>  Java exception: java.lang.BootstrapMethodError: call site initialization 
>> exception
>>  [exec] java.lang.BootstrapMethodError: call site initialization 
>> exception
>>  [exec]   makeSite at java/lang/invoke/CallSite.java:341
>>  [exec]   linkCallSiteImpl at 
>> java/lang/invoke/MethodHandleNatives.java:307
>>  [exec]   linkCallSite at 
>> java/lang/invoke/MethodHandleNatives.java:297
>>  [exec]   block in autorun at 
>> /home/travis/build/jruby/jruby/test/mri/lib/test/unit.rb:935
>>  [exec] callDirect at 
>> org/jruby/runtime/CompiledIRBlockBody.java:151
>>  [exec]   call at org/jruby/runtime/IRBlockBody.java:77
>>  [exec]   call at org/jruby/runtime/Block.java:124
>>  [exec]   call at org/jruby/RubyProc.java:288
>>  [exec]   call at org/jruby/RubyProc.java:272
>>  [exec]   tearDown at org/jruby/Ruby.java:3276
>>  [exec]   tearDown at org/jruby/Ruby.java:3249
>>  [exec]internalRun at org/jruby/Main.java:309
>>  [exec]run at org/jruby/Main.java:232
>>  [exec]   main at org/jruby/Main.java:204
>>  [exec]
>>  [exec] Caused by:
>>  [exec] java.lang.InternalError: 
>> BMH.reinvoke=Lambda(a0:L/SpeciesData,a1:L,a2:L,a3:L)=>{
>>  [exec] t4:L=Species_L.argL0(a0:L);
>>  [exec] t5:L=MethodHandle.invokeBasic(t4:L,a1:L,a2:L,a3:L);t5:L}
>>  [exec] newInternalError at 
>> java/lang/invoke/MethodHandleStatics.java:127
>>  [exec]compileToBytecode at java/lang/invoke/LambdaForm.java:660
>>  [exec]  prepare at java/lang/invoke/LambdaFor

Re: Error, Java 8, lambda form compilation

2018-02-28 Thread Charles Oliver Nutter
Hey, I'm still not sure how best to deal with this, but we've been
consistently getting a similar error at the same place. It has kept JRuby
master CI red for many weeks.

The problem does not reproduce when running in isolation...only in a long
test run, and so far only on Travis CI (Ubuntu 16.something, Java 8u151).

Looking at the code, it appears the dropArguments call below (called from
MethodHandles.guardWithTest:3018) was replaced with some new code and
dropArgumentsToMatch in 9. I have not read through logs to see if that
change might be related.

Unhandled Java exception: java.lang.InternalError:
exactInvoker=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
 [exec] t3:L=BoundMethodHandle$Species_LL.argL1(a0:L);
 [exec] t4:L=MethodHandle.invokeBasic(t3:L);
 [exec] t5:L=BoundMethodHandle$Species_LL.argL0(a0:L);
 [exec] t6:V=Invokers.checkExactType(t4:L,t5:L);
 [exec] t7:V=Invokers.checkCustomized(t4:L);
 [exec] t8:I=MethodHandle.invokeBasic(t4:L);t8:I}
 [exec] java.lang.InternalError:
exactInvoker=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
 [exec] t3:L=BoundMethodHandle$Species_LL.argL1(a0:L);
 [exec] t4:L=MethodHandle.invokeBasic(t3:L);
 [exec] t5:L=BoundMethodHandle$Species_LL.argL0(a0:L);
 [exec] t6:V=Invokers.checkExactType(t4:L,t5:L);
 [exec] t7:V=Invokers.checkCustomized(t4:L);
 [exec] t8:I=MethodHandle.invokeBasic(t4:L);t8:I}
 [exec]newInternalError at java/lang/invoke/MethodHandleStatics.java:127
 [exec]   compileToBytecode at java/lang/invoke/LambdaForm.java:660
 [exec] prepare at java/lang/invoke/LambdaForm.java:635
 [exec]   at java/lang/invoke/MethodHandle.java:461
 [exec]   at java/lang/invoke/BoundMethodHandle.java:58
 [exec]   at java/lang/invoke/Species_LL:-1
 [exec]copyWith at java/lang/invoke/Species_LL:-1
 [exec]   dropArguments at java/lang/invoke/MethodHandles.java:2465
 [exec]   guardWithTest at java/lang/invoke/MethodHandles.java:3018
 [exec]   guardWithTest at java/lang/invoke/SwitchPoint.java:173
 [exec] searchConst at
org/jruby/ir/targets/ConstantLookupSite.java:103


On Fri, Jan 12, 2018 at 9:54 AM Charles Oliver Nutter 
wrote:

> I wish I could provide more info here. Just got another one in CI:
>
>  [exec] [1603/8763] 
> TestBenchmark#test_benchmark_makes_extra_calcultations_with_an_Array_at_the_end_of_the_benchmark_and_show_the_resultUnhandled
>  Java exception: java.lang.BootstrapMethodError: call site initialization 
> exception
>  [exec] java.lang.BootstrapMethodError: call site initialization exception
>  [exec]   makeSite at java/lang/invoke/CallSite.java:341
>  [exec]   linkCallSiteImpl at 
> java/lang/invoke/MethodHandleNatives.java:307
>  [exec]   linkCallSite at 
> java/lang/invoke/MethodHandleNatives.java:297
>  [exec]   block in autorun at 
> /home/travis/build/jruby/jruby/test/mri/lib/test/unit.rb:935
>  [exec] callDirect at 
> org/jruby/runtime/CompiledIRBlockBody.java:151
>  [exec]   call at org/jruby/runtime/IRBlockBody.java:77
>  [exec]   call at org/jruby/runtime/Block.java:124
>  [exec]   call at org/jruby/RubyProc.java:288
>  [exec]   call at org/jruby/RubyProc.java:272
>  [exec]   tearDown at org/jruby/Ruby.java:3276
>  [exec]   tearDown at org/jruby/Ruby.java:3249
>  [exec]internalRun at org/jruby/Main.java:309
>  [exec]run at org/jruby/Main.java:232
>  [exec]   main at org/jruby/Main.java:204
>  [exec]
>  [exec] Caused by:
>  [exec] java.lang.InternalError: 
> BMH.reinvoke=Lambda(a0:L/SpeciesData,a1:L,a2:L,a3:L)=>{
>  [exec] t4:L=Species_L.argL0(a0:L);
>  [exec] t5:L=MethodHandle.invokeBasic(t4:L,a1:L,a2:L,a3:L);t5:L}
>  [exec] newInternalError at 
> java/lang/invoke/MethodHandleStatics.java:127
>  [exec]compileToBytecode at java/lang/invoke/LambdaForm.java:660
>  [exec]  prepare at java/lang/invoke/LambdaForm.java:635
>  [exec]at java/lang/invoke/MethodHandle.java:461
>  [exec]at java/lang/invoke/BoundMethodHandle.java:58
>  [exec]at 
> java/lang/invoke/BoundMethodHandle.java:211
>  [exec] make at 
> java/lang/invoke/BoundMethodHandle.java:224
>  [exec]makeReinvoker at 
> java/lang/invoke/BoundMethodHandle.java:141
>  [exec]   rebind at 
> java/lang/invoke/DirectMethodHandle.java:130
>  [exec]  insertArguments at java/lang/invoke/MethodHandles.java:2371
>  [exec]   up at 
> com/headius/invokebinder/tr

Performance of non-static method handles

2018-02-02 Thread Charles Oliver Nutter
Hey folks!

I'm running some simple benchmarks for my FOSDEM handles talk and wanted to
reopen discussion about the performance of non-static-final method handles.

In my test, I just try to call a method that adds given argument to a
static long. The numbers for reflection and static final handle are what
I'd expect, with the latter basically being equivalent to a direct call:

Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call

If the handle is coming from an instance field or local variable, however,
performance is only slightly faster than reflection. I assume the only real
improvement in this case is that it doesn't box the long value I pass in.

local var Handle: 2.7ns/call

What can we do to improve the performance of non-static method handle
invocation?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Error, Java 8, lambda form compilation

2018-01-12 Thread Charles Oliver Nutter
I wish I could provide more info here. Just got another one in CI:

 [exec] [1603/8763]
TestBenchmark#test_benchmark_makes_extra_calcultations_with_an_Array_at_the_end_of_the_benchmark_and_show_the_resultUnhandled
Java exception: java.lang.BootstrapMethodError: call site
initialization exception
 [exec] java.lang.BootstrapMethodError: call site initialization exception
 [exec]   makeSite at java/lang/invoke/CallSite.java:341
 [exec]   linkCallSiteImpl at java/lang/invoke/MethodHandleNatives.java:307
 [exec]   linkCallSite at java/lang/invoke/MethodHandleNatives.java:297
 [exec]   block in autorun at
/home/travis/build/jruby/jruby/test/mri/lib/test/unit.rb:935
 [exec] callDirect at org/jruby/runtime/CompiledIRBlockBody.java:151
 [exec]   call at org/jruby/runtime/IRBlockBody.java:77
 [exec]   call at org/jruby/runtime/Block.java:124
 [exec]   call at org/jruby/RubyProc.java:288
 [exec]   call at org/jruby/RubyProc.java:272
 [exec]   tearDown at org/jruby/Ruby.java:3276
 [exec]   tearDown at org/jruby/Ruby.java:3249
 [exec]internalRun at org/jruby/Main.java:309
 [exec]run at org/jruby/Main.java:232
 [exec]   main at org/jruby/Main.java:204
 [exec]
 [exec] Caused by:
 [exec] java.lang.InternalError:
BMH.reinvoke=Lambda(a0:L/SpeciesData,a1:L,a2:L,a3:L)=>{
 [exec] t4:L=Species_L.argL0(a0:L);
 [exec] t5:L=MethodHandle.invokeBasic(t4:L,a1:L,a2:L,a3:L);t5:L}
 [exec] newInternalError at
java/lang/invoke/MethodHandleStatics.java:127
 [exec]compileToBytecode at java/lang/invoke/LambdaForm.java:660
 [exec]  prepare at java/lang/invoke/LambdaForm.java:635
 [exec]at java/lang/invoke/MethodHandle.java:461
 [exec]at java/lang/invoke/BoundMethodHandle.java:58
 [exec]at java/lang/invoke/BoundMethodHandle.java:211
 [exec] make at java/lang/invoke/BoundMethodHandle.java:224
 [exec]makeReinvoker at java/lang/invoke/BoundMethodHandle.java:141
 [exec]   rebind at java/lang/invoke/DirectMethodHandle.java:130
 [exec]  insertArguments at java/lang/invoke/MethodHandles.java:2371
 [exec]   up at
com/headius/invokebinder/transform/Insert.java:99


On Tue, Jan 9, 2018 at 12:18 PM Vladimir Ivanov <
vladimir.x.iva...@oracle.com> wrote:

> Thanks, Charlie.
>
> Unfortunately, it doesn't give much info without the exception which
> caused it.
>
> jdk/src/share/classes/java/lang/invoke/LambdaForm.java:
> 659 } catch (Error | Exception ex) {
> 660 throw newInternalError(this.toString(), ex);
> 661 }
>
> Best regards,
> Vladimir Ivanov
>
> On 1/9/18 9:10 PM, Charles Oliver Nutter wrote:
> > Unfortunately this just happened in one build, but I thought I'd post it
> > here for posterity.
> >
> > Unhandled Java exception: java.lang.InternalError:
> identity_L=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
> >   [exec] t3:L=Species_L.argL0(a0:L);t3:L}
> >   [exec] java.lang.InternalError:
> identity_L=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
> >   [exec] t3:L=Species_L.argL0(a0:L);t3:L}
> >   [exec]newInternalError at
> java/lang/invoke/MethodHandleStatics.java:127
> >   [exec]   compileToBytecode at java/lang/invoke/LambdaForm.java:660
> >   [exec] prepare at java/lang/invoke/LambdaForm.java:635
> >   [exec]   at
> java/lang/invoke/MethodHandle.java:461
> >   [exec]   at
> java/lang/invoke/BoundMethodHandle.java:58
> >   [exec]   at
> java/lang/invoke/BoundMethodHandle.java:211
> >   [exec]copyWith at
> java/lang/invoke/BoundMethodHandle.java:228
> >   [exec]   dropArguments at
> java/lang/invoke/MethodHandles.java:2465
> >   [exec]   dropArguments at
> java/lang/invoke/MethodHandles.java:2535
> >   [exec]  up at
> com/headius/invokebinder/transform/Drop.java:39
> >   [exec]  invoke at
> com/headius/invokebinder/Binder.java:1143
> >   [exec]constant at
> com/headius/invokebinder/Binder.java:1116
> >   [exec] searchConst at
> org/jruby/ir/targets/ConstantLookupSite.java:98
> >   [exec]block in autorun at
> /home/travis/build/jruby/jruby/test/mri/lib/test/unit.rb:935
> >   [exec]  callDirect at
> org/jruby/runtime/CompiledIRBlockBody.java:151
> >   [exec]call at org/jruby/runtime/IRBlockBody.java:77
> >   [exec]call at org/jruby/runtime/Block.jav

Error, Java 8, lambda form compilation

2018-01-09 Thread Charles Oliver Nutter
Unfortunately this just happened in one build, but I thought I'd post it
here for posterity.

Unhandled Java exception: java.lang.InternalError:
identity_L=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
 [exec] t3:L=Species_L.argL0(a0:L);t3:L}
 [exec] java.lang.InternalError:
identity_L=Lambda(a0:L/SpeciesData,a1:L,a2:L)=>{
 [exec] t3:L=Species_L.argL0(a0:L);t3:L}
 [exec]newInternalError at java/lang/invoke/MethodHandleStatics.java:127
 [exec]   compileToBytecode at java/lang/invoke/LambdaForm.java:660
 [exec] prepare at java/lang/invoke/LambdaForm.java:635
 [exec]   at java/lang/invoke/MethodHandle.java:461
 [exec]   at java/lang/invoke/BoundMethodHandle.java:58
 [exec]   at java/lang/invoke/BoundMethodHandle.java:211
 [exec]copyWith at java/lang/invoke/BoundMethodHandle.java:228
 [exec]   dropArguments at java/lang/invoke/MethodHandles.java:2465
 [exec]   dropArguments at java/lang/invoke/MethodHandles.java:2535
 [exec]  up at
com/headius/invokebinder/transform/Drop.java:39
 [exec]  invoke at com/headius/invokebinder/Binder.java:1143
 [exec]constant at com/headius/invokebinder/Binder.java:1116
 [exec] searchConst at
org/jruby/ir/targets/ConstantLookupSite.java:98
 [exec]block in autorun at
/home/travis/build/jruby/jruby/test/mri/lib/test/unit.rb:935
 [exec]  callDirect at
org/jruby/runtime/CompiledIRBlockBody.java:151
 [exec]call at org/jruby/runtime/IRBlockBody.java:77
 [exec]call at org/jruby/runtime/Block.java:124
 [exec]call at org/jruby/RubyProc.java:288
 [exec]call at org/jruby/RubyProc.java:272
 [exec]tearDown at org/jruby/Ruby.java:3276
 [exec]tearDown at org/jruby/Ruby.java:3249
 [exec] internalRun at org/jruby/Main.java:309
 [exec] run at org/jruby/Main.java:232
 [exec]main at org/jruby/Main.java:204

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Writing a compiler to handles, but filter seems to executed in reverse

2018-01-05 Thread Charles Oliver Nutter
Thanks a bunch y'all. I'm thinking invokebinder should do the "right thing"
and manually apply the filters in the proper order on affected JVMs...or
perhaps always. Warm-up notwithstanding, what cost would we pay to always
do single filter MHs versus doing them as a group that instead becomes
single LF adaptations?

On Wed, Jan 3, 2018, 21:22 John Rose  wrote:

> Thanks, IBM!!
>
> Filed:  https://bugs.openjdk.java.net/browse/JDK-8194554
>
> On Jan 3, 2018, at 12:04 PM, Remi Forax  wrote:
>
>
> IBM implementation uses the left to right order !
> I've just tested with the latest Java 8 available.
>
> Java(TM) SE Runtime Environment (build 8.0.5.7 -
> pxa6480sr5fp7-20171216_01(SR5 FP7))
> IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64 Compressed References
> 20171215_373586 (JIT enabled, AOT enabled)
> OpenJ9   - 5aa401f
> OMR  - 101e793
> IBM  - b4a79bf)
>
> so it's an implementation bug, #2 seems to be the right solution.
>
> Rémi
>
> --
>
> *De: *"John Rose" 
> *À: *"Da Vinci Machine Project" 
> *Envoyé: *Mercredi 3 Janvier 2018 20:37:42
> *Objet: *Re: Writing a compiler to handles, but filter seems to executed
> in reverse
>
> On Jan 2, 2018, at 12:35 PM, Charles Oliver Nutter 
> wrote:
>
>
> Is there a good justification for doing it this way, rather than having
>
> filterArguments start with the *last* filter nearest the target?
>
>
> No, it's a bug.  The javadoc API spec. does not emphasize the ordering
> of the filter invocations, but the pseudocode makes it pretty clear what
> order things should come in.  Certainly the spec. does not promise the
> current behavior.  When I wrote the spec. I intended the Java argument
> evaluation order to apply, and the filters to be executed left-to-right.
> And then, when I wrote the code, I used an accumulative algorithm
> with a for-each loop, leading indirectly to reverse evaluation order.
> Oops.
>
> There are two ways forward:
>
> 1. Declare the spec. ambiguous, and document the current behavior
> as the de facto standard.
>
> 2. Declare the spec. unambiguous, change the behavior to left-to-right
> as a bug fix, and clarify the spec.
>
> I think we can try for #2, on the grounds that multiple filters are a rare
> occurrence.  The risk is that existing code that uses multiple filters
> *and*
> has side effect ordering constraints between the filters will break.
>
> Question:  What does the IBM JVM do?  I think they have a very
> different implementation, and they are supposed to follow the spec.
>
> — John
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Writing a compiler to handles, but filter seems to executed in reverse

2018-01-02 Thread Charles Oliver Nutter
So I have some basic expressions working in my pseudo-compiler, and the
experiment has been interesting so far. A few things I've learned:

(for code "a = 1; b = 2; (a + b) > 1", here's the assembly
output: https://gist.github.com/headius/f765260a00590fc2b4cd033b5a657e6b)

* The approach is interesting and stretches handles quite a bit, but it
takes a long time to heat up and longer to generate native code. This may
be acceptable on platforms where user code can't load new JVM bytecode
(assuming the handle impl on that platform produces decent code).
* My mechanism of using Object[] to hold local variables seems to break
escape analysis on both hotspot and graal, probably because that array
write is too opaque to escape through. The Object[] itself is also
constructed in the same compilation unit, though, that doesn't appear to
tidy up either.
* According to LogCompilation, everything inlines, including the call to my
type-checking "add" method for the "+" call here, but...
* According to PrintAssembly, the direct handles to "add" and "lt" don't
actually appear to inline. Why?

  0x00011418f1f1: movabs rcx,0x76c5065b0;*invokestatic linkToStatic
{reexecute=0 rethrow=0 return_oop=0}
; -
java.lang.invoke.LambdaForm$DMH/2137211482::invokeStatic_LL_L@11
; -
java.lang.invoke.LambdaForm$BMH/522188921::reinvoke@50
; -
java.lang.invoke.LambdaForm$BMH/522188921::reinvoke@15
; -
java.lang.invoke.LambdaForm$MH/1973471376::identity_L@68
;   {oop(a
'java/lang/invoke/MemberName' = {method} {0x00010efd7358} 'add'
'(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in
'com/headius/jruby/HandleCompiler')}
  0x00011418f1fb: movQWORD PTR [rsp+0xb0],rax
  0x00011418f203: nop
  0x00011418f204: nop
  0x00011418f205: nop
  0x00011418f206: nop
  0x00011418f207: call   0x0001138f2420  ; OopMap{[176]=Oop off=460}
;*invokestatic linkToStatic
{reexecute=0 rethrow=0 return_oop=0}
; -
java.lang.invoke.LambdaForm$DMH/2137211482::invokeStatic_LL_L@11
; -
java.lang.invoke.LambdaForm$BMH/522188921::reinvoke@50
; -
java.lang.invoke.LambdaForm$BMH/522188921::reinvoke@15
; -
java.lang.invoke.LambdaForm$MH/1973471376::identity_L@68
;   {static_call}
  0x00011418f20c: movrsi,rax
  0x00011418f20f: movabs rdx,0x76c511bb8;   {oop(a 'java/lang/Long'
= 1)}
  0x00011418f219: movabs rcx,0x76c512068;*invokestatic linkToStatic
{reexecute=0 rethrow=0 return_oop=0}
; -
java.lang.invoke.LambdaForm$DMH/2137211482::invokeStatic_LL_L@11
; -
java.lang.invoke.LambdaForm$BMH/522188921::reinvoke@50
; -
java.lang.invoke.LambdaForm$MH/1973471376::identity_L@68
;   {oop(a
'java/lang/invoke/MemberName' = {method} {0x00010efd77d0} 'gt'
'(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in
'com/headius/jruby/HandleCompiler')}
  0x00011418f223: nop
  0x00011418f224: nop
  0x00011418f225: nop
  0x00011418f226: nop
  0x00011418f227: call   0x0001138f2420  ; OopMap{off=492}
;*invokestatic linkToStatic
{reexecute=0 rethrow=0 return_oop=0}
; -
java.lang.invoke.LambdaForm$DMH/2137211482::invokeStatic_LL_L@11
; -
java.lang.invoke.LambdaForm$BMH/522188921::reinvoke@50
    ; -
java.lang.invoke.LambdaForm$MH/1973471376::identity_L@68
;   {static_call}


On Tue, Jan 2, 2018 at 3:54 PM Charles Oliver Nutter 
wrote:

> I have released invokebinder 1.11, which includes Binder.filterForward
> that guarantees left-to-right evaluation of the filters (by doing them
> individually).
>
> I'd still like to understand if this is intentional behavior in OpenJDK or
> if it is perhaps a bug.
>
> - Charlie
>
> On Tue, Jan 2, 2018 at 3:10 PM Charles Oliver Nutter 
> wrote:
>
>> Yes, I figured I would need it for that too, but this filter behavior
>> sent me off on a weird tangent.
>>
>> It is gros

Re: Writing a compiler to handles, but filter seems to executed in reverse

2018-01-02 Thread Charles Oliver Nutter
I have released invokebinder 1.11, which includes Binder.filterForward that
guarantees left-to-right evaluation of the filters (by doing them
individually).

I'd still like to understand if this is intentional behavior in OpenJDK or
if it is perhaps a bug.

- Charlie

On Tue, Jan 2, 2018 at 3:10 PM Charles Oliver Nutter 
wrote:

> Yes, I figured I would need it for that too, but this filter behavior sent
> me off on a weird tangent.
>
> It is gross in code to do the filters manually in forward order, but
> perhaps it's not actually a big deal? OpenJDK's impl applies each filter as
> its own layer anyway.
>
> - Charlie
>
> On Tue, Jan 2, 2018 at 3:04 PM Remi Forax  wrote:
>
>> You also need the loop combinator for implementing early return (the
>> return keyword),
>> I think i have an example of how to map a small language to a loop
>> combinator somewhere,
>> i will try to find that (or rewrite it) tomorrow.
>>
>> cheers,
>> Rémi
>>
>> --
>>
>> *De: *"Charles Oliver Nutter" 
>> *À: *"Da Vinci Machine Project" 
>> *Envoyé: *Mardi 2 Janvier 2018 21:36:33
>> *Objet: *Re: Writing a compiler to handles, but filter seems to executed
>> in reverse
>>
>> An alternative workaround: I do the filters myself, manually, in the
>> order that I want them to executed. Also gross.
>>
>> On Tue, Jan 2, 2018 at 2:35 PM Charles Oliver Nutter 
>> wrote:
>>
>>> Ahh I believe I see it now.
>>> filterArguments starts with the first filter, and wraps the incoming
>>> target handle with each in turn. However, because it's starting at the
>>> target, you get the filters stacked up in reverse order:
>>>
>>> filter(target, 0, a, b, c, d)
>>>
>>> ends up as
>>>
>>> d_filter(c_filter(b_filter(a_filter(target
>>>
>>> And so naturally when invoked, they execute in reverse order.
>>>
>>> This seems I am surprised we have not run into this as a problem, but I
>>> believe most of my uses of filter in JRuby have been pure functions where
>>> order was not important (except for error conditions).
>>>
>>> Now in looking for a fix, I've run into the nasty workaround required to
>>> get filters to execute in the correct order: you have to reverse the
>>> filters, and then reverse the results again. This is far from desirable,
>>> since it requires at least one permute to put the results back in proper
>>> order.
>>>
>>> Is there a good justification for doing it this way, rather than having
>>> filterArguments start with the *last* filter nearest the target?
>>>
>>> - Charlie
>>>
>>> On Tue, Jan 2, 2018 at 2:17 PM Charles Oliver Nutter <
>>> head...@headius.com> wrote:
>>>
>>>> Hello all, long time no write!
>>>> I'm finally playing with writing a "compiler" for JRuby that uses only
>>>> method handles to represent code structure. For most simple expressions,
>>>> this obviously works well. However I'm having trouble with blocks of code
>>>> that contain multiple expressions.
>>>>
>>>> Starting with the standard call signature through the handle tree, we
>>>> have a basic (Object[])Object type. The Object[] contains local variable
>>>> state for the script, and will be as wide as there are local variables. AST
>>>> nodes are basically compiled into little functions that take in the
>>>> variable state and produce a value. In this way, every expression in the
>>>> tree can be compiled, including local variable sets and gets, loops, and so
>>>> on.
>>>>
>>>> Now the tricky bit...
>>>>
>>>> The root node for a given script contains one or more expressions that
>>>> should be executed in sequence, with the final result being returned. The
>>>> way I'm handling this in method handles is as follows (invokebinder code
>>>> but hopefully easy to read):
>>>>
>>>> MethodHandle[] handles =
>>>> Arrays
>>>> .stream(rootNode.children())
>>>> .map(node -> compile(node))
>>>> .toArray(n -> new MethodHandle[n]);
>>>>
>>>> return Binder.from(Object.class, Object[].class)
>>>> .permute(new int[handles.length])
>>>> .filter(0, handles)
>&g

Re: Writing a compiler to handles, but filter seems to executed in reverse

2018-01-02 Thread Charles Oliver Nutter
Yes, I figured I would need it for that too, but this filter behavior sent
me off on a weird tangent.

It is gross in code to do the filters manually in forward order, but
perhaps it's not actually a big deal? OpenJDK's impl applies each filter as
its own layer anyway.

- Charlie

On Tue, Jan 2, 2018 at 3:04 PM Remi Forax  wrote:

> You also need the loop combinator for implementing early return (the
> return keyword),
> I think i have an example of how to map a small language to a loop
> combinator somewhere,
> i will try to find that (or rewrite it) tomorrow.
>
> cheers,
> Rémi
>
> --------------
>
> *De: *"Charles Oliver Nutter" 
> *À: *"Da Vinci Machine Project" 
> *Envoyé: *Mardi 2 Janvier 2018 21:36:33
> *Objet: *Re: Writing a compiler to handles, but filter seems to executed
> in reverse
>
> An alternative workaround: I do the filters myself, manually, in the order
> that I want them to executed. Also gross.
>
> On Tue, Jan 2, 2018 at 2:35 PM Charles Oliver Nutter 
> wrote:
>
>> Ahh I believe I see it now.
>> filterArguments starts with the first filter, and wraps the incoming
>> target handle with each in turn. However, because it's starting at the
>> target, you get the filters stacked up in reverse order:
>>
>> filter(target, 0, a, b, c, d)
>>
>> ends up as
>>
>> d_filter(c_filter(b_filter(a_filter(target
>>
>> And so naturally when invoked, they execute in reverse order.
>>
>> This seems I am surprised we have not run into this as a problem, but I
>> believe most of my uses of filter in JRuby have been pure functions where
>> order was not important (except for error conditions).
>>
>> Now in looking for a fix, I've run into the nasty workaround required to
>> get filters to execute in the correct order: you have to reverse the
>> filters, and then reverse the results again. This is far from desirable,
>> since it requires at least one permute to put the results back in proper
>> order.
>>
>> Is there a good justification for doing it this way, rather than having
>> filterArguments start with the *last* filter nearest the target?
>>
>> - Charlie
>>
>> On Tue, Jan 2, 2018 at 2:17 PM Charles Oliver Nutter 
>> wrote:
>>
>>> Hello all, long time no write!
>>> I'm finally playing with writing a "compiler" for JRuby that uses only
>>> method handles to represent code structure. For most simple expressions,
>>> this obviously works well. However I'm having trouble with blocks of code
>>> that contain multiple expressions.
>>>
>>> Starting with the standard call signature through the handle tree, we
>>> have a basic (Object[])Object type. The Object[] contains local variable
>>> state for the script, and will be as wide as there are local variables. AST
>>> nodes are basically compiled into little functions that take in the
>>> variable state and produce a value. In this way, every expression in the
>>> tree can be compiled, including local variable sets and gets, loops, and so
>>> on.
>>>
>>> Now the tricky bit...
>>>
>>> The root node for a given script contains one or more expressions that
>>> should be executed in sequence, with the final result being returned. The
>>> way I'm handling this in method handles is as follows (invokebinder code
>>> but hopefully easy to read):
>>>
>>> MethodHandle[] handles =
>>> Arrays
>>> .stream(rootNode.children())
>>> .map(node -> compile(node))
>>> .toArray(n -> new MethodHandle[n]);
>>>
>>> return Binder.from(Object.class, Object[].class)
>>> .permute(new int[handles.length])
>>> .filter(0, handles)
>>> .drop(0, handles.length - 1)
>>> .identity();
>>>
>>> In pseudo-code, this basically duplicates the Object[] as many times as
>>> there are lines of code to execute, and then uses filterArguments to
>>> evaluate each in turn. Then everything but the last result is culled and
>>> the final result is returned.
>>>
>>> Unfortunately, this doesn't work right: filterArguments appears to
>>> execute in reverse order. When I try to run a simple script like "a = 1; a"
>>> the "a" value comes back null, because it is executed first.
>>>
>>> Is this expected? Do filters, when executed, actually process from the
>>> last argument back, rather than the first argument forward?
>>>
>>> Note: I know this would be possible to do with guaranteed ordering using
>>> the new loop combinators in 9. I'm working up to that for examples for a
>>> talk.
>>>
>>> - Charlie
>>>
>>>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Writing a compiler to handles, but filter seems to executed in reverse

2018-01-02 Thread Charles Oliver Nutter
An alternative workaround: I do the filters myself, manually, in the order
that I want them to executed. Also gross.

On Tue, Jan 2, 2018 at 2:35 PM Charles Oliver Nutter 
wrote:

> Ahh I believe I see it now.
>
> filterArguments starts with the first filter, and wraps the incoming
> target handle with each in turn. However, because it's starting at the
> target, you get the filters stacked up in reverse order:
>
> filter(target, 0, a, b, c, d)
>
> ends up as
>
> d_filter(c_filter(b_filter(a_filter(target
>
> And so naturally when invoked, they execute in reverse order.
>
> This seems I am surprised we have not run into this as a problem, but I
> believe most of my uses of filter in JRuby have been pure functions where
> order was not important (except for error conditions).
>
> Now in looking for a fix, I've run into the nasty workaround required to
> get filters to execute in the correct order: you have to reverse the
> filters, and then reverse the results again. This is far from desirable,
> since it requires at least one permute to put the results back in proper
> order.
>
> Is there a good justification for doing it this way, rather than having
> filterArguments start with the *last* filter nearest the target?
>
> - Charlie
>
> On Tue, Jan 2, 2018 at 2:17 PM Charles Oliver Nutter 
> wrote:
>
>> Hello all, long time no write!
>>
>> I'm finally playing with writing a "compiler" for JRuby that uses only
>> method handles to represent code structure. For most simple expressions,
>> this obviously works well. However I'm having trouble with blocks of code
>> that contain multiple expressions.
>>
>> Starting with the standard call signature through the handle tree, we
>> have a basic (Object[])Object type. The Object[] contains local variable
>> state for the script, and will be as wide as there are local variables. AST
>> nodes are basically compiled into little functions that take in the
>> variable state and produce a value. In this way, every expression in the
>> tree can be compiled, including local variable sets and gets, loops, and so
>> on.
>>
>> Now the tricky bit...
>>
>> The root node for a given script contains one or more expressions that
>> should be executed in sequence, with the final result being returned. The
>> way I'm handling this in method handles is as follows (invokebinder code
>> but hopefully easy to read):
>>
>> MethodHandle[] handles =
>> Arrays
>> .stream(rootNode.children())
>> .map(node -> compile(node))
>> .toArray(n -> new MethodHandle[n]);
>>
>> return Binder.from(Object.class, Object[].class)
>> .permute(new int[handles.length])
>> .filter(0, handles)
>> .drop(0, handles.length - 1)
>> .identity();
>>
>> In pseudo-code, this basically duplicates the Object[] as many times as
>> there are lines of code to execute, and then uses filterArguments to
>> evaluate each in turn. Then everything but the last result is culled and
>> the final result is returned.
>>
>> Unfortunately, this doesn't work right: filterArguments appears to
>> execute in reverse order. When I try to run a simple script like "a = 1; a"
>> the "a" value comes back null, because it is executed first.
>>
>> Is this expected? Do filters, when executed, actually process from the
>> last argument back, rather than the first argument forward?
>>
>> Note: I know this would be possible to do with guaranteed ordering using
>> the new loop combinators in 9. I'm working up to that for examples for a
>> talk.
>>
>> - Charlie
>>
>>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Writing a compiler to handles, but filter seems to executed in reverse

2018-01-02 Thread Charles Oliver Nutter
Ahh I believe I see it now.

filterArguments starts with the first filter, and wraps the incoming target
handle with each in turn. However, because it's starting at the target, you
get the filters stacked up in reverse order:

filter(target, 0, a, b, c, d)

ends up as

d_filter(c_filter(b_filter(a_filter(target

And so naturally when invoked, they execute in reverse order.

This seems I am surprised we have not run into this as a problem, but I
believe most of my uses of filter in JRuby have been pure functions where
order was not important (except for error conditions).

Now in looking for a fix, I've run into the nasty workaround required to
get filters to execute in the correct order: you have to reverse the
filters, and then reverse the results again. This is far from desirable,
since it requires at least one permute to put the results back in proper
order.

Is there a good justification for doing it this way, rather than having
filterArguments start with the *last* filter nearest the target?

- Charlie

On Tue, Jan 2, 2018 at 2:17 PM Charles Oliver Nutter 
wrote:

> Hello all, long time no write!
>
> I'm finally playing with writing a "compiler" for JRuby that uses only
> method handles to represent code structure. For most simple expressions,
> this obviously works well. However I'm having trouble with blocks of code
> that contain multiple expressions.
>
> Starting with the standard call signature through the handle tree, we have
> a basic (Object[])Object type. The Object[] contains local variable state
> for the script, and will be as wide as there are local variables. AST nodes
> are basically compiled into little functions that take in the variable
> state and produce a value. In this way, every expression in the tree can be
> compiled, including local variable sets and gets, loops, and so on.
>
> Now the tricky bit...
>
> The root node for a given script contains one or more expressions that
> should be executed in sequence, with the final result being returned. The
> way I'm handling this in method handles is as follows (invokebinder code
> but hopefully easy to read):
>
> MethodHandle[] handles =
> Arrays
> .stream(rootNode.children())
> .map(node -> compile(node))
> .toArray(n -> new MethodHandle[n]);
>
> return Binder.from(Object.class, Object[].class)
> .permute(new int[handles.length])
> .filter(0, handles)
> .drop(0, handles.length - 1)
> .identity();
>
> In pseudo-code, this basically duplicates the Object[] as many times as
> there are lines of code to execute, and then uses filterArguments to
> evaluate each in turn. Then everything but the last result is culled and
> the final result is returned.
>
> Unfortunately, this doesn't work right: filterArguments appears to execute
> in reverse order. When I try to run a simple script like "a = 1; a" the "a"
> value comes back null, because it is executed first.
>
> Is this expected? Do filters, when executed, actually process from the
> last argument back, rather than the first argument forward?
>
> Note: I know this would be possible to do with guaranteed ordering using
> the new loop combinators in 9. I'm working up to that for examples for a
> talk.
>
> - Charlie
>
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Writing a compiler to handles, but filter seems to executed in reverse

2018-01-02 Thread Charles Oliver Nutter
Hello all, long time no write!

I'm finally playing with writing a "compiler" for JRuby that uses only
method handles to represent code structure. For most simple expressions,
this obviously works well. However I'm having trouble with blocks of code
that contain multiple expressions.

Starting with the standard call signature through the handle tree, we have
a basic (Object[])Object type. The Object[] contains local variable state
for the script, and will be as wide as there are local variables. AST nodes
are basically compiled into little functions that take in the variable
state and produce a value. In this way, every expression in the tree can be
compiled, including local variable sets and gets, loops, and so on.

Now the tricky bit...

The root node for a given script contains one or more expressions that
should be executed in sequence, with the final result being returned. The
way I'm handling this in method handles is as follows (invokebinder code
but hopefully easy to read):

MethodHandle[] handles =
Arrays
.stream(rootNode.children())
.map(node -> compile(node))
.toArray(n -> new MethodHandle[n]);

return Binder.from(Object.class, Object[].class)
.permute(new int[handles.length])
.filter(0, handles)
.drop(0, handles.length - 1)
.identity();

In pseudo-code, this basically duplicates the Object[] as many times as
there are lines of code to execute, and then uses filterArguments to
evaluate each in turn. Then everything but the last result is culled and
the final result is returned.

Unfortunately, this doesn't work right: filterArguments appears to execute
in reverse order. When I try to run a simple script like "a = 1; a" the "a"
value comes back null, because it is executed first.

Is this expected? Do filters, when executed, actually process from the last
argument back, rather than the first argument forward?

Note: I know this would be possible to do with guaranteed ordering using
the new loop combinators in 9. I'm working up to that for examples for a
talk.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Managing JNI resources in a classloader that might go away

2017-02-24 Thread Charles Oliver Nutter
Hey smart folks, I have a conundrum and Google is failing me.

As you may know, we have been maintaining the Java Native Runtime libraries
for providing FFI from Java pre-Panama. These libraries handle loading and
binding C functions to Java endpoints automagically.

Unfortunately, jffi -- the base library of the JNR stack -- has a few
off-heap structures it allocates to support FFI calls. Those structures are
generally held in static fields and cleaned up via finalization.

This seems to be a somewhat fatal design flaw in situations where the
classloader that started up jffi might go away long before the JVM shuts
down.

I've got a segfault, and all signs point toward it being a case of trying
to call the JNI C code in jffi *after* the classloader has finalized and
unloaded the library. The si_addr of the SIGSEGV and the top frame of the
stack are the same address, which tells me that the segfault was caused by
trying to call the JNI C code, which in this case is custom code to clean
up those off-heap resources.

I have found no easy answer to this problem. You can't tell when your
classloader unloads, and as far as I can tell you can't tell that that the
JNI library has gone away. And of course you can't guarantee finalization
order. Sometimes, it works fine. But eventually, it fails. My logging of
classloader finalization versus data freeing ends like this:

classloader finalized 2014779152
freeing in 2014779152
freeing in 2014779152

I have not come up with any solution. These off-heap structures are tied to
the lifecycle of the JNI backend, but there's no obvious way to clean them
up just before the JNI backend gets unloaded.

So, questions:

1. Does it seem like I'm on the right track?
2. Anyone have ideas for dealing with this? My best idea right now is to
add a bunch of smarts to JNI_onunload that tidies everything up, rather
than allowing finalization to do it at some indeterminate time in the
future.
3. Why does JNI + classloading suck so bad?

Frustratedly yours,

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Classloading glitch in publicLookup.findConstructor?

2017-02-24 Thread Charles Oliver Nutter
We've had a number of reports of LinkageError-related problems with a
constructor handle acquired through publicLookup.

Here's the commit that fixed the problem, with a link to a PR explaining
the error:

https://github.com/jruby/jruby/commit/32926ac194c03f0e61c0121e9da0b0427cfa5869

It seems like the error indicates a class is getting loaded into the
bootstrap classloader during lookup when it should not, and as a result any
child classloaders that load it later on have a conflicting copy.

Thoughts? This is tested on a very recent Java 8.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Leaking LambdaForm classes?

2017-01-06 Thread Charles Oliver Nutter
On Fri, Jan 6, 2017 at 2:59 PM, Vladimir Ivanov <
vladimir.x.iva...@oracle.com> wrote:

> LambdaForm caches deliberately keep LF instances using SoftReferences.
>
> The motivation is:
>   (1) LFs are heavily shared;
>   (2) LFs are expensive to construct (LF interpreter is turned off by
> default now); it involves the following steps: new LF instance + compile to
> bytecode + class loading.
>
> So, keeping a LF instance for a while usually pays off, especially during
> startup/warmup. There should be some heap/metaspace pressure to get them
> cleared.
>
> As a workaround, try -XX:SoftRefLRUPolicyMSPerMB=0 to make soft references
> behave as weak.


I'll pass that along, thank you. I'm not sure how vigorously he's tried to
get GC to clear things out.

Not sure the problem relates to j.l.i & LFs since the report says indy in
> jruby is turned off. For heavy usages of indy/j.l.i 1000s of LFs are
> expected (<5k). The question is how does the count change over time.
>

JRuby has progressed to the point of using method handles and indy all the
time, since for some cases the benefits are present without any issues.
"Enabling" indy in JRuby mostly just turns on the use of indy for method
call sites and instance variables now.

That said, the numbers this user is reporting do seem really high, which is
why I asked in here for similar stories. Even if we considered a very large
Ruby application with many hundreds of files, we'd still see non-indy MH
usages in JRuby in thousands at best (mostly for "constant" lookup sites).

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Leaking LambdaForm classes?

2017-01-06 Thread Charles Oliver Nutter
Anyone else encountered this?

https://github.com/jruby/jruby/issues/4391

We have a user reporting metaspace getting filled up with LambdaForm
classes that have no instances. I would not expect this to happen given
that they're generated via AnonymousClassloader and we would need to hold a
reference to them to keep them alive.

I'm trying to get a heap dump from this user. If anyone has other
suggestions, feel free to comment on the issue.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: series of switchpoints or better

2016-10-05 Thread Charles Oliver Nutter
On Wed, Oct 5, 2016 at 6:26 PM, Jochen Theodorou  wrote:

> There is one more special problem I have though: per instance meta
> classes. So even if a x and y have the same class as per JVM, they can have
> differing meta classes. Which means a switchpoint alone is not enough...
> well, trying to get rid of that in the new MOP.


JRuby also has per-instance classes (so-called "singleton classes"). We
treat them like any other class. HOWEVER...if there's a singleton class
that does not override any methods from the original class, it shares a
SwitchPoint until such time that it is modified.

I've also considered caching singleton classes of various shapes, so we can
just choose based on known shapes...but never went further with that
experiment.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: series of switchpoints or better

2016-10-05 Thread Charles Oliver Nutter
On Oct 5, 2016 17:43, "Jochen Theodorou"  wrote:
> I see... the problem is actually similar, only that I do not have to do
something like that on a per "subclass added" event, but on a per "method
crud operation" event. And instead of going up to check for a
devirtualization, I have to actually propagate the change to all meta
classes of subclasses... and interface implementation (if the change was
made to an interface). So far I was thinking of making this lazy... but
maybe I should actually mark the classes as "dirty" eagerly... sorry... not
part of the discussion I guess ;)

Oh I think it is certainly relevant! JRuby does this invalidation eagerly,
but the cost can be high for changes to classes close to the root of the
hierarchy. You have fewer guards at each call site, though.

John's description of how Hotspot does this is also helpful; at least in
JRuby, searching up-hierarchy for overridden methods is just a name lookup
since Ruby does not overload. I've prototyped a similar system, with a
SwitchPoint per method, but ran into some hairy class structures that made
it complicated. The override search may be the answer for me.

- Charlie (mobile)
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: EXT: Re: series of switchpoints or better

2016-10-05 Thread Charles Oliver Nutter
On Wed, Oct 5, 2016 at 1:36 PM, Jochen Theodorou  wrote:

> If I hear Remi saying volatile read... then it does not sound free to me
> actually. In my experience volatile reads still present inlining barriers.
> But if Remi and all of you tell me it is still basically free, then I will
> not look too much at the volatile ;)
>

The volatile read is only used in the interpreter.

In Groovy we use SwitchPoint as well, but only one for the whole meta class
> system that could clearly improved it seems. Having a Switchpoint per
> method is actually a very interesting approach I would not have considered
> before, since it means creating a ton of Switchpoint objects. Not sure if
> that works in practice for me since it is difficult to make a switchpoint
> for a method that does not exist in the super class, but may come into
> existence later on - still it seems I should be considering this.
>

I suspect Groovy developers are also less likely to modify classes at
runtime? In Ruby, it's not uncommon to keep creating new classes or
modifying existing ones at runtime, though it is generally discouraged (all
runtimes suffer).


> cold performance is a consideration for me as well though. The heavy
> creation time of MethodHandles is one of the reasons we do not use
> invokedynamic as much as we could... especially considering that creating a
> new cache entry via runtime class generation and still invoking the method
> via reflection is actually faster than producing one of our complex method
> handles right now.
>

Creating a new cache entry via class generation? Can you elaborate on that?
JRuby has a non-indy mode, but it doesn't do any code generation per call
site.


> As for Charles question:
>
>> Can you elaborate on the structure? JRuby has 6-deep (configurable)
>> polymorphic caching, with each entry being a GWT (to check type) and a SP
>> (to check modification) before hitting the plumbing for the method itself.
>>
>
> right now we use a 1-deep cache with several GWT (check type and argument
> types) and one SP plus several transformations. My goal is of course also
> the 6-deep polymorphic caching in the end. Just motivation for this was not
> so high before. If I use several SwitchPoint, then of course each of them
> would be there for each cache entry. How many depends on the receiver type.
> But at least one for each super class (and interface)
>

Ahh, so when you invalidate, you only invalidate one class, but every call
site would have a SwitchPoint for the target class and all of its
superclasses. That will be more problematic for cold performance than
JRuby's way, but less overhead when invalidating. I'm not which trade-off
is better.

We also use this invalidation mechanism when calling dynamic methods from
Java (since we also use call site caches there) but those sites are not
(yet) guarded by a SwitchPoint.


> To me horror I just found one pice of code commented with:
> //TODO: remove this method if possible by switchpoint usage
>

With recent improvements to MH boot time and cold performance, I've started
to use indy by default in more places, carefully measuring startup overhead
along the way. I'm well on my way toward having fully invokedynamic-aware
jitted code basically be all invokedynamics.


> It is also good to hear that the old "once invalidated, it will not
> optimized again - ever" is no longer valid.
>

And hopefully it will stay that way as long as we keep making noise :-)

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: series of switchpoints or better

2016-10-05 Thread Charles Oliver Nutter
Hi Jochen!

On Wed, Oct 5, 2016 at 7:37 AM, Jochen Theodorou  wrote:
>
> If the meta class for A is changed, all handles operating on instances of
> A may have to reselect. the handles for B and Object need not to be
> affected. If the meta class for Object changes, I need to invalidate all
> the handles for A, B and Object.
>

This is exactly how JRuby's type-modification guards work. We've used this
technique since our first implementation of indy call sites.


> Doing this with switchpoints means probably one switchpoint per metaclass
> and a small number of meta classes per class (in total 3 in my example).
> This would mean my MethodHandle would have to get through a bunch of
> switchpoints, before it can do the actual method invocation. And while
> switchpoints might be fast it does not sound good to me.
>

>From what I've seen, it's fine as far as hot performance. Adding complexity
to your handle chains likely impacts cold perf, of course.

Can you elaborate on the structure? JRuby has 6-deep (configurable)
polymorphic caching, with each entry being a GWT (to check type) and a SP
(to check modification) before hitting the plumbing for the method itself.

I will say that using SwitchPoints is FAR better than our alternative
mechanism: pinging the (meta)class each time and checking a serial number.


> Or I can do one switchpoint for all methodhandles in the system, which
> makes me wonder if after a meta class change the callsite ever gets Jitted
> again. The later performance penalty is actually also not very attractive
> to me.
>

We have fought to keep the JIT from giving up on us, and I believe that as
of today you can invalidate call sites forever and the JIT will still
recompile them (within memory, code cache, and other limits of course).

However, you'll be invalidating every call site for every modification. If
the system eventually settles, that's fine. If it doesn't, you're going to
be stuck with cold call site performance most of the time.


> So what is the way to go here? Or is there an even better way?
>

I strongly recommend the switchpoint-per-class granularity (or finer, like
switchpoint-per-class-and-method-name, which I am playing with now).

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: ClassValue perf?

2016-05-02 Thread Charles Oliver Nutter
On Wed, May 6, 2015 at 6:36 PM, Jochen Theodorou  wrote:

> Charlie, did you ever get to writing some benchmarks?
>

Unfortunately not but we are getting into a performance phase over the next
couple months. I'll see what I can come up with.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Getting back into indy...need a better argument collector!

2016-01-14 Thread Charles Oliver Nutter
Thanks Duncan. I will try to look under the covers this evening.

- Charlie (mobile)
On Jan 14, 2016 14:39, "MacGregor, Duncan (GE Energy Management)" <
duncan.macgre...@ge.com> wrote:

> On 11/01/2016, 11:27, "mlvm-dev on behalf of MacGregor, Duncan (GE Energy
> Management)"  duncan.macgre...@ge.com> wrote:
>
> >On 11/01/2016, 03:16, "mlvm-dev on behalf of Charles Oliver Nutter"
> >
> >wrote:
> >...
> >>With asCollector: 16-17s per iteration
> >>
> >>With hand-written array construction: 7-8s per iteration
> >>
> >>A sampling profile only shows my Ruby code as the top items, and an
> >>allocation trace shows Object[] as the number one object being
> >>created...not IRubyObject[]. Could that be the reason it's slower?
> >>Some type trickery messing with optimization?
> >>
> >>This is very unfortunate because there's no other general-purpose way
> >>to collect arguments in a handle chain.
> >
> >I haven¹t done any comparative benchmarks in that area for a while, but
> >collecting a single argument is a pretty common pattern in the Magik code,
> >and I had not seen any substantial difference when we last touched that
> >area. However we are collecting to plain Object[] so it might be that is
> >the reason for the difference. If I¹ve got time later this week I¹ll do
> >some experimenting and check what the current situation is.
>
> Okay, I’ve now had a chance to try this in with our language benchmarks
> and can’t see any significant difference between a hand crafted method and
> asCOllector, but we are dealing with Object and Object[], so it might be
> something to do with additional casting.
>
> Duncan.
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Getting back into indy...need a better argument collector!

2016-01-10 Thread Charles Oliver Nutter
Hello folks! Now that we're a few months into JRuby 9000 I've started
to hack on the indy bindings again. Things are looking good so far.
I'm working on getting closures to inline where they're invoked by
chaining together a number of GWT just like a polymorphic call site.

Anyway, my discovery today was that it's too expensive to collect a
bunch of arguments at the end of an argument list right now.

For one place in closure dispatch, we need to box a single incoming
argument in an array. In InvokeBinder code, that looks like this:

```
Binder.from(IRubyObject[].class, IRubyObject.class).collect(0,
IRubyObject[].class).identity();
```

Since there's only a single argument and we're boxing to the end of
the list, this turns into a handle.asCollector followed by an
identity.

Unfortunately, it's MUCH faster to just bind to a hand-written method
that constructs the array directly.

```
Binder.from(IRubyObject[].class, IRubyObject.class)
.invokeStaticQuiet(MethodHandles.lookup(),
CompiledIRBlockBody.class, "wrapValue");

private static IRubyObject[] wrapValue(IRubyObject value) {
return new IRubyObject[] {value};
}
```

I was running a benchmark that calls a method with a closure 10M
times, and the method yields back to the closure 25 times, so a total
of 250M closure dispatches passing through the above adapter (among
others). Everything seems to fit together nicely, though I haven't
checked inlining yet.

With asCollector: 16-17s per iteration

With hand-written array construction: 7-8s per iteration

A sampling profile only shows my Ruby code as the top items, and an
allocation trace shows Object[] as the number one object being
created...not IRubyObject[]. Could that be the reason it's slower?
Some type trickery messing with optimization?

This is very unfortunate because there's no other general-purpose way
to collect arguments in a handle chain.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


ClassValue rooting objects after it goes away?

2015-08-06 Thread Charles Oliver Nutter
Pardon me if this has been discussed before, but we had a bug (with
fix) reported today that seems to indicate that the JVM is rooting
objects put into a ClassValue even if the ClassValue goes away.

Here's the pull request: https://github.com/jruby/jruby/pull/3228

And here's one example of the root trace leading back to our JRuby
runtime. All the roots appear to be VM-level code:

https://dl.dropboxusercontent.com/u/9213410/class-values-leak.png

Is this expected? If we have to stuff a WeakReference into the
ClassValue it seriously diminishes its utility to us.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: ClassValue perf?

2015-04-30 Thread Charles Oliver Nutter
On Mon, Apr 27, 2015 at 12:50 PM, Jochen Theodorou  wrote:
> Am 27.04.2015 19:17, schrieb Charles Oliver Nutter:
>> Jochen: Is your class-to-metaclass map usable apart from the Groovy
>> codebase?
>
>
> Yes. Look for org.codehaus.groovy.reflection.GroovyClassValuePreJava7 which
> is normally wrapped by a factory.

Excellent, thank you!

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: ClassValue perf?

2015-04-30 Thread Charles Oliver Nutter
On Wed, Apr 29, 2015 at 4:02 AM, Doug Simon  wrote:
> We considered using ClassValue in Graal for associating each Node with its 
> NodeClass. Accessing the NodeClass is a very common operation in Graal (e.g., 
> it’s used to iterate over a Node’s inputs). However, brief experimentation 
> showed implementing this with ClassValue performed significantly worse than a 
> direct field access[1]. We currently use ClassValue to link Class values with 
> their Graal mirrors. Accessing this link is infrequent enough that the 
> performance trade off against injecting a field to java.lang.Class[2] is 
> acceptable.

That's what I'm banking on too. My case is similar to Groovy's: I need
a way to *initially* get the metaclass for a given JVM class. Unlike
Groovy, however, we still have to wrap Java objects in a JRuby-aware
wrapper, so subsequent accesses of the class via that object are via a
plain field. So the impact of ClassValue will mostly be at the border
between Ruby and Java, when we need to initially build that wrapper
and put some metaclass in it.

Of course the disadvantage of the wrapper is the wrapper itself. If we
could inject our IRubyObject interface into java.lang.Object my life
would be much better. But I digress.

> The memory footprint improvement suggested in JDK-8031043 would still help.

I'll have to take a look at that. We're pretty memory-sensitive since
Ruby's already fairly heap-intensive.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: ClassValue perf?

2015-04-27 Thread Charles Oliver Nutter
It seems I may have to write some benchmarks for this then. Just so I
understand, the equivalent non-ClassValue-based store would need to:

* Be atomic; value may calculate more than once but only be set once.
* Be weak; classes given class values must not be rooted as a result
(an external impl like in JRuby or Groovy would have to use weak maps
for this).

Jochen: Is your class-to-metaclass map usable apart from the Groovy codebase?

- Charlie

On Mon, Apr 27, 2015 at 11:40 AM, Christian Thalinger
 wrote:
>
> On Apr 24, 2015, at 2:17 PM, John Rose  wrote:
>
> On Apr 24, 2015, at 5:38 AM, Charles Oliver Nutter 
> wrote:
>
>
> Hey folks!
>
> I'm wondering how the performance of ClassValue looks on recent
> OpenJDK 7 and 8 builds. JRuby 9000 will be Java 7+ only, so this is
> one place I'd like to simplify our code a bit.
>
> I could measure myself, but I'm guessing some of you have already done
> a lot of exploration or have benchmarks handy. So, what say you?
>
>
> I'm listening too.  We don't have any special optimizations for CVs,
> and I'm hoping the generic code is a good-enough start.
>
>
> A while ago (wow; it’s more than a year already) I was working on:
>
> [#JDK-8031043] ClassValue's backing map should have a smaller initial size -
> Java Bug System
>
> and we had a conversation about it:
>
> http://mail.openjdk.java.net/pipermail/mlvm-dev/2014-January/005597.html
>
> It’s not about performance directly but it’s about memory usage and maybe
> the one-value-per-class optimization John suggests is in fact a performance
> improvement.  Someone should pick this one up.
>
> — John
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


ClassValue perf?

2015-04-24 Thread Charles Oliver Nutter
Hey folks!

I'm wondering how the performance of ClassValue looks on recent
OpenJDK 7 and 8 builds. JRuby 9000 will be Java 7+ only, so this is
one place I'd like to simplify our code a bit.

I could measure myself, but I'm guessing some of you have already done
a lot of exploration or have benchmarks handy. So, what say you?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: IntelliJ debugger?

2015-04-06 Thread Charles Oliver Nutter
I have never gotten either Netbeans' or IntelliJ's debuggers to step
through source without a .java extension (speaking specifically of
Ruby code, even if source dirs and JSR-45 stuff are in place).

I can easily get jdb to step through any language's source, so I know
it's not a problem with how I'm emitting debug info.

- Charlie

On Fri, Apr 3, 2015 at 1:51 AM, Dain Sundstrom  wrote:
> So I did a bunch more testing, and this is what I found:
>
> - The IntelliJ debugger ignores the “source” declaration in the class file 
> and instead always looks for a “.java” file in the source path
> - The file must contain a java class declaration with the same name
> - The file must be “recognized” by IntelliJ before the debugger stops, so you 
> can’t dynamically generate a bogus java file
> - If the file is not present, the debugger will not show local variables
> - The debugger seems to ignore local variable type declarations, so the 
> “Evaluate Expressions” window does not get type ahead (but works otherwise).
>
> I might try adding a JSR-45 SMAP, but I don’t have high hopes based on 
> Charlie’s comments at the last summit.
>
> Does anyone else have any ideas on things that might work?
>
> -dain
>
> On Apr 1, 2015, at 11:08 PM, Dain Sundstrom  wrote:
>
>> Hi all,
>>
>> I think this might have been asked before... Has anyone gotten the intelliJ 
>> debugger to step through the source file for their language?
>>
>> Adding the source and line numbers during generation makes stack traces to 
>> come out correctly, and Intellij even opens the correct file location.  
>> During debugging, I can see the correct correct source and line numbers, but 
>> intellij doesn’t open the file.
>>
>> I’d even be ok with a hacky solution where I rename all of my files to be 
>> “x.java”.
>>
>> Thanks,
>>
>> -dain
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Lost perf between 8u40 and 9 hs-comp

2015-03-06 Thread Charles Oliver Nutter
Ok, now we're cracking! Performance has definitely returned, and
actually improved 15-20% beyond my current copy of 8u40. Bravo!

I will try testing several other benchmarks, and perhaps set up a
machine to do the big perf regression suite the JRuby+Truffle guys
made for us.

FWIW, the additional "Per" flags did not appear to help performance,
and actually seemd to degrade it almost back to where 8u40 lies.

- Charlie

On Fri, Mar 6, 2015 at 7:06 AM, Vladimir Ivanov
 wrote:
> John,
>
> You are absolutely right. I should've spent more time exploring the code
> than writing emails :-)
>
> Here's the fix:
> http://cr.openjdk.java.net/~vlivanov/8074548/webrev.00/
>
> Charlie, I'd love to hear your feedback on it. It fixes the regression on
> bench_red_black.rb for me.
>
> Also, please, try -XX:PerBytecodeRecompilationCutoff=-1
> -XX:PerMethodRecompilationCutoff=-1 (to workaround another problem I spotted
> [1]).
>
> On 3/4/15 5:16 AM, John Rose wrote:
>>
>> On Mar 3, 2015, at 3:21 PM, Vladimir Ivanov 
>> wrote:
>>>
>>>
>>> Ah, I see now.
>>>
>>> You suggest to conditionally insert uncommon trap in MHI.profileBoolean
>>> when a count == 0, right?
>>>
>>> Won't we end up with 2 checks if VM can't fold them (e.g. some action in
>>> between)?
>>
>>
>> Maybe; that's the weak point of the idea.  The VM *does* fold many
>> dominating ifs, as you know.
>>
>> But, if the profileBoolean really traps on one branch, then it can return
>> a *constant* value, can't it?
>>
>> After that, the cmps and ifs will fold up.
>
> Brilliant idea! I think JIT can find that out itself, but additional help
> always useful.
>
> The real weak point IMO is that we need to keep MHI.profileBoolean intrinsic
> and never-taken branch pruning logic during parsing (in parse2.cpp) to keep
> in sync. Otherwise, if VM starts to prune rarely taken branches at some
> point, we can end up in the same situation.
>
> Best regards,
> Vladimir Ivanov
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8074551
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: What can we improve in JSR292 for Java 9?

2015-03-04 Thread Charles Oliver Nutter
On Thu, Feb 26, 2015 at 4:27 AM, Jochen Theodorou  wrote:
> my biggest request: allow the call of a super constructor (like
> super(foo,bar)) using MethodHandles an have it understood by the JVM like a
> normal super constructor call... same for this(...)

Just so I understand...the problem is that unless you can get a Lookup
that can do the super call from Java (i.e. from within a subclass),
you can't get a handle that can do the super call, right? And you
can't do that because the method bodies might not be emitted into a
natural subclass of the super class?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: What can we improve in JSR292 for Java 9?

2015-03-04 Thread Charles Oliver Nutter
Busy week, finally circling back to this thread...

On Wed, Feb 25, 2015 at 8:29 PM, John Rose  wrote:
>> * A loop handle :-)
>>
>> Given a body and a test, run the body until the test is false. I'm
>> guessing there's a good reason we don't have this already.
>
> A few reasons:   1. You can code your own easily.

I can't code one that will specialize for every call path, though,
unless I generate a new loop body for every call path.

> 2. There's no One True Loop the way there is a One True If.
> The "run until test is false" model assumes all the real work is
> done with side-effects, which are off-center from the MH model.

This I can appreciate. My mental model of MHs started to trend toward
a general-purpose IR, and I believe if I had some sort of backward
branch it could be that. But I understand if that's the wrong
conceptual model, and I realize now that MHs are basically call stack
adapters with a bit of forward branching thrown in.

It does feel like there's a need for better representation of
branch-joining or phi or whatever you want to call it, though.

> 3. A really clean looping mechanism probably needs a sprinkle
> of tail call optimization.
>
> I'm not saying that loops should never have side effects, but I
> am saying that a loop mechanism should not mandate them.
>
> Maybe this is general enough:
>
> MHs.loop(init, predicate, body)(*a)
> => { let i = init(*a); while (predicate(i, *a)) { i = body(i, *a); } 
> return i; }
>
> ...where the type of i depends on init, and if init returns void then you
> have a classic side-effect-only loop.

Ahh yes, this makes sense. If it were unrolled, it would simply be a
series of folds and drops as each iteration through the body modified
the condition in some way. So then we just need it to work without
unrolling.

My silly use case for this would be to emit simple expressions
entirely as a MH chain, so we'd get the benefit of MH optimizations
without generating our own bytecode (and with forced inlining and
perhaps a richer semantic representation than just bytecode). It's not
a very compelling case, of course, since I could just emit bytecode
too.

>> * try/finally as a core atom of MethodHandles API.
>>
>> Libraries like invokebinder provide a shortcut API To generating the
>> large tree of handles needed for try/finally, but the JVM may not be
>> able to optimize that tree as well as a purpose-built adapter.
>
> I agree there.  We should put this in.
>
>MHs.tryFinally(target, cleanup)(*a)
>  => { try { return target(*a); } finally { cleanup(*a); } }
>
> (Even here there are non-universalities; what if the cleanup
> wants to see the return value and/or the thrown exception?
> Should it take those as one or two leading arguments?)

In InvokeBinder, the finally is expected to require no additional
arguments compared to the try body, since that was the use case I
needed. You bring up a good point...and perhaps the built-in JSR292
tryFinally should take *two* handles: one for the exceptional path
(with exception in hand) and one for the non-exceptional path (with
return value in hand)? The exceptional path would be expected to
return the same type as the try body or re-raise the exception. The
non-exceptional path would be expected to return void.

> We now have MHs.collectArguments.  Do you want MHs.spreadArguments
> to reverse the effect?  Or is there something else I'm missing?

I want to be able to group arguments at any position in the argument
list. Example:

Incoming signature: (String foo, int a1, int a2, int a3, Object obj)
Target signature: (String foo, int[] as, Object obj)

... without permuting arguments

> Idea of the day:  An ASM-like library for method handles.
> Make a MethodHandleReader which can run a visitor over the MH.
> The ops of the visitor would be a selection of public MH operations
> like filter, collect, spread, lookup, etc.
> Also ASM-like, the library would have a MethodHandleWriter
> would could be hooked up with the reader to make filters.

That would certainly cover it! I'd expect this to add:

* MethodHandleVisitor interface of some kind
* MethodHandle#accept(MethodHandleVisitor) or similar
* MethodHandleType enum with all the base MH types (so we're not
forcing all types into a static interface).

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Lost perf between 8u40 and 9 hs-comp

2015-03-03 Thread Charles Oliver Nutter
Thanks for looking into it Vladimir...I'm standing by to test out anything!

- Charlie

On Tue, Mar 3, 2015 at 10:23 AM, Vladimir Ivanov
 wrote:
> John,
>
>> So let's make hindsight work for us:  Is there a way (either with or
>> without the split you suggest) to more firmly couple the update to the
>> query?  Separating into two operations might be the cleanest way to go, but
>> I think it's safer to keep both halves together, as long as the slow path
>> can do the right stuff.
>>
>> Suggestion:  Instead of have the intrinsic expand to nothing, have it
>> expand to an uncommon trap (on the slow path), with the uncommon trap doing
>> the profile update operation (as currently coded).
>
> Right now, VM doesn't care about profiling logic at all. The intrinsic is
> used only to inject profile data and all profiling happens in Java code.
> Once MHI.profileBoolean is intrinsified (profile is injected), no profiling
> actions are performed.
>
> The only way I see is to inject count bump on pruned branch before issuing
> uncommon trap. Alike profile_taken_branch in Parse::do_if, but it should not
> update MDO, but user-specified int[2]).
>
> It looks irregular and spreads profiling logic between VM & Java code. But
> it allows to keep single entry point between VM & Java (MHI.profileBoolean).
>
> I'll prototype it to see how does it look like on the code level.
>
> Best regards,
> Vladimir Ivanov
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


What can we improve in JSR292 for Java 9?

2015-02-25 Thread Charles Oliver Nutter
After talking with folks at the Jfokus VM Summit, it seems like
there's a number of nice-to-have and a few need-to-have features we'd
like to see get into java.lang.invoke. Vladimir suggested I start a
thread on these features.

A few from me:

* A loop handle :-)

Given a body and a test, run the body until the test is false. I'm
guessing there's a good reason we don't have this already.

* try/finally as a core atom of MethodHandles API.

Libraries like invokebinder provide a shortcut API To generating the
large tree of handles needed for try/finally, but the JVM may not be
able to optimize that tree as well as a purpose-built adapter.

* Argument grouping operations in the middle of the argument list.

JRuby has many signatures that vararg somewhere other than the end of
the argument list, and the juggling required to do that logic in
handles is complex: shift to-be-boxed args to end, box them, shift box
back.

Another point about these more complicated forms: they're ESPECIALLY
slow early in execution, before LFs have been compiled to bytecode.

* Implementation-specific inspection API.

I know there are different ways to express a MH tree on different JVMs
(e.g. J9) but it would still be a big help for me if there were a good
way to get some debug-time structural information about a handle I'm
using. Hidden API would be ok if it's not too hidden :-)

That's off the top of my head. Others?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Lost perf between 8u40 and 9 hs-comp

2015-02-25 Thread Charles Oliver Nutter
I'm finally at home with a working machine so I can follow up on some
VM Summit to-dos.

Vladimir wanted me to test out jdk9 hs-comp, which has all his latest
work on method handles. I wish I could report that performance looks
great, but it doesn't.

Here's timing (in s) of our red/black benchmark on JRuby 1.7.19, first
on the latest (as of today) 8u40 snapshot build and then on a
minutes-old jdk9 hs-comp build:

~/projects/jruby $ (pickjdk 4 ; rvm jruby-1.7.19 do ruby
-Xcompile.invokedynamic=true ../rubybench/time/bench_red_black.rb 10)
New JDK: jdk1.8.0_40.jdk
5.206
2.497
0.69
0.703
0.72
0.645
0.698
0.673
0.685
0.67

~/projects/jruby $ (pickjdk 5 ; rvm jruby-1.7.19 do ruby
-Xcompile.invokedynamic=true ../rubybench/time/bench_red_black.rb 10)
New JDK: jdk1.9_hs-comp
5.048
3.773
1.836
1.474
1.366
1.394
1.249
1.399
1.352
1.346

Perf is just about 2x slower on jdk9 hs-comp.

I tried out a few other benchmarks, which don't seem to have as much variation:

* recursive fib(35): equal perf
* mandelbrot: jdk8u40 5% faster
* protobuf: jdk9 5% faster

The benchmarks are in jruby/rubybench on Github. JRuby 1.7.19 can be
grabbed from jruby.org or built from jruby/jruby (see BUILDING.md).

Looking forward to helping improve this :-)

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Invokedynamic and recursive method call

2015-01-07 Thread Charles Oliver Nutter
This could explain performance regressions we've seen on the
performance of heavily-recursive algorithms. I'll try to get an
assembly dump for fib in JRuby later today.

- Charlie

On Wed, Jan 7, 2015 at 10:13 AM, Remi Forax  wrote:
>
> On 01/07/2015 10:43 AM, Marcus Lagergren wrote:
>>
>> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently
>> fast.
>
>
> yes, nashorn is fast enough but it can be faster if the JIT was not doing
> something stupid.
>
> When the VM inline fibo, because fibo is recursive, the recursive call is
> inlined only once,
> so the call at depth=2 can not be inlined but should be a classical direct
> call.
>
> But if fibo is called through an invokedynamic, instead of emitting a direct
> call to fibo,
> the JIT generates a code that push the method handle on stack and execute it
> like if the metod handle was not constant
> (the method handle is constant because the call at depth=1 is inlined !).
>
>> When did it start to regress?
>
>
> jdk7u40, i believe.
>
> I've created a jar containing some handwritten bytecodes with no dependency
> to reproduce the issue easily:
>   https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar
>
> [forax@localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real0m6.653s
> user0m6.729s
> sys0m0.019s
> [forax@localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real0m6.572s
> user0m6.591s
> sys0m0.019s
> [forax@localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real0m6.373s
> user0m6.396s
> sys0m0.016s
> [forax@localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real0m4.847s
> user0m4.832s
> sys0m0.019s
>
> as you can see, it was faster with a JDK before jdk7u40.
>
>>
>> Regards
>> Marcus
>
>
> cheers,
> Rémi
>
>
>>
>>> On 30 Dec 2014, at 20:48, Remi Forax  wrote:
>>>
>>> Hi guys,
>>> I've found a bug in the interaction between the lambda form and inlining
>>> algorithm,
>>> basically if the inlining heuristic bailout because the method is
>>> recursive and already inlined once,
>>> instead to emit a code to do a direct call, it revert to do call to
>>> linkStatic with the method
>>> as MemberName.
>>>
>>> I think it's a regression because before the introduction of lambda
>>> forms,
>>> I'm pretty sure that the JIT was emitting a direct call.
>>>
>>> Step to reproduce with nashorn, run this JavaScript code
>>> function fibo(n) {
>>>   return (n < 2)? 1: fibo(n - 1) + fibo(n - 2)
>>> }
>>>
>>> print(fibo(45))
>>>
>>> like this:
>>>   /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions
>>> -J-XX:+PrintAssembly fibo.js > log.txt
>>>
>>> look for a method 'fibo' from the tail of the log, you will find
>>> something like this:
>>>
>>>   0x7f97e4b4743f: mov$0x76d08f770,%r8   ;   {oop(a
>>> 'java/lang/invoke/MemberName' = {method} {0x7f97dcff8e40} 'fibo'
>>> '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in
>>> 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')}
>>>   0x7f97e4b47449: xchg   %ax,%ax
>>>   0x7f97e4b4744b: callq  0x7f97dd0446e0
>>>
>>> I hope this can be fixed. My demonstration that I can have fibo written
>>> with a dynamic language
>>> that run as fast as written in Java doesn't work anymore :(
>>>
>>> cheers,
>>> Rémi
>>>
>>> ___
>>> mlvm-dev mailing list
>>> mlvm-dev@openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Truffle and mlvm

2014-10-03 Thread Charles Oliver Nutter
world” as close as possible.
> > There is absolutely no reason to believe that a Truffle-based Ruby
> > implementation would not have benefits for “real-world applications”. Or
> > that it would not be able to run a large application for a long time. It
> is
> > clear that the TruffleRuby prototype needs more completeness work both at
> > the language and the library level. We are very happy with the results we
> > got so far with Chris working for about a year. We are planning to
> increase
> > the number of people working on this, and would also be grateful for any
> > help we can get from the Ruby community.
> >
> > Regarding Graal:  Did you ever try to benchmark JRuby without Truffle
> with
> > the latest Graal binaries available at
> > http://lafo.ssw.uni-linz.ac.at/builds/? We would be looking forward to
> > see the peak performance results on a couple of workloads. We are not
> > speculating about Graal becoming part of a particular OpenJDK release (as
> > experimental or regular option). This is the sovereign decision of the
> > OpenJDK community. All we can do is to demonstrate and inform about
> Graal’s
> > performance and stability.
> >
> > We recognise that there is a long road ahead. But in particular in this
> > context, I would like to emphasize that we are looking for more people to
> > support this effort for a new language implementation platform. I
> strongly
> > believe that Truffle is the best currently available vehicle to make Ruby
> > competitive in terms of performance with node.js. We are happy to try to
> > *prove* you wrong - even happier about support of any kind along the road
> > ;). I am also looking forward to continue this discussion at JavaOne (as
> > part of the TruffleRuby session or elsewhere).
> >
> > Regards, thomas
> >
> > On 30 Aug 2014, at 21:21, Charles Oliver Nutter 
> > wrote:
> >
> > > Removing all context, so it's clear this is just my opinions and
> > thoughts...
> > >
> > > As most of you know, we've opened up our codebase and incorporated the
> > > graciously-donated RubyTruffle directly into JRuby. It's available on
> > > JRuby master and we are planning to ship Truffle support with JRuby
> > > 9000, our next major version (due out in the next couple months).
> > >
> > > At the same time, we have been developing our own next-gen IR-based
> > > compiler, which will run unmodified on any JVM (with or without
> > > invokedynamic, though I still have to implement the "without" side).
> > > Why are we doing this when Truffle shows such promise?
> > >
> > > I'll try to enumerate the benefits and problems of Truffle here.
> > >
> > > * Benefits of using Truffle
> > >
> > > 1. Simpler implementation.
> > >
> > > From day 1, the most obvious benefit of Truffle is that you just have
> > > to write an AST interpreter. Anyone who has implemented a programming
> > > language can do this easily. This specific benefit doesn't help us
> > > implement JRuby, since we already have an AST interpreter, but it did
> > > make Chris Seaton's job easier building RubyTruffle initially. This
> > > also means a Truffle-based language is more approachable than one with
> > > a complicated compiler pipeline of its own.
> > >
> > > 2. Better communication with the JIT.
> > >
> > > Truffle, via Graal, has potential to pass much more information on to
> > > the JIT. Things like type shape, escaped references, frame access,
> > > type specialization, and so on can be communicated directly, rather
> > > than hoping and praying they'll be inferred by the shape of bytecodes.
> > > This is probably the largest benefit; much of my time optimizing JRuby
> > > has been spend trying to "trick" C2 into doing the right thing, since
> > > I don't have a direct way to communicate intent.
> > >
> > > The peak performance numbers for Truffle-based languages have been
> > > extremely impressive. If it's possible to get those numbers reasonably
> > > quickly and with predictable steady-state behavior in large,
> > > heterogeneous codebases, this is definitely the quickest path (on any
> > > runtime) to a high-performance language implementation.
> > >
> > > 3. OSS and pure Java
> > >
> > > Truffle and Graal are just OpenJDK projects under OpenJDK licenses,
> > > and anyone can build, hack, or distribute them. In addition, both
>

Re: The Great Startup Problem

2014-09-01 Thread Charles Oliver Nutter
On Mon, Sep 1, 2014 at 2:07 AM, Vladimir Ivanov
 wrote:
> Stack usage won't be constant though. Each compiled LF being executed
> consumes 1 stack frame, so for a method handle chain of N elements, it's
> invocation consumes ~N stack frames.
>
> Is it acceptable and solves the problem for you?

This is acceptable for JRuby. Our worst-case Ruby method handle chain
will include at most:

* Two CatchExceptions for pre/post logic (heap frames, etc). Perf of
CatchException compared to literal Java try/catch is important here.
* Up to two permute arguments for differing call site/target argument ordering.
* Varargs negotiation (may be a couple handles)
* GWT
* SwitchPoint
* For Ruby to Java calls, each argument plus the return value must be
filtered to convert to/from Ruby types or apply an IRubyObject wrapper

This is worst case, mind you. Most calls in the system will be
arity-matched, eliminating the permutes. Most calls will be three or
fewer arguments, eliminating varargs. Many calls will be optimized to
no longer need a heap frame, eliminating the try/finally. The absolute
minimum for any call would be SwitchPoint plus GWT.

Of course I'm not counting DMHs here, since they're either the call we
want to make or they're leaf logic.

> We discussed an idea to generate custom bytecodes (single method) for the
> whole method handle chain (and have only 1 extra stack frame per MH
> invocation), but it defeats memory footprint reduction we are trying to
> archieve with LambdaForm sharing.

Funny thing...because indy slows our startup and increases our warmup
time, we're using our old binding logic by default. And surprise
surprise, our old binding logic does exactly this...one small
generated invoker class per method. I'm sure you're right that this
approach defeats the sharing and memory reduction we'd like to see
from LFs, but it works *really* well if you're ok with the extra class
and metaspace data in memory.

So there's one question: is the cost of a bytecoded adapter shim for
each method object really that high? Yes, if you're spinning new MHs
constantly or doing a million different adaptations of a given method.
But if you're just lazily creating an invoker shim once per method,
that really doesn't seem like a big deal.

My indy binding logic also has a dozen different flags for tweaking. I
can easily modify it to avoid doing all that pre/post logic and
argument permutation in the MH chain and just bind directly to the
generated invoker. Best (or worst) of both worlds? I just really don't
want to have to do that...I want everything from call site to target
method body to be in the MH chain.

For JRuby 9000, all try/finally logic will be within the target
method, so at least that part of the MH chain goes away.

Here's another idea...

We've been using my InvokeBinder library heavily in JRuby. It provides
a Java API/DSL for creating MH chains lazily from the top down:

MethodHandle mh = Binder.from(String.class, Object.class, Float.class)
.tryFinally(finallyLogic)
.permute(1, 0)
.append("Hello")
.drop(1)
.invokeStatic(MyClass.class, "someMethod");

The adaptations are gathered within the Binder instance, playing
forward as you add adaptations and played backward at binding time to
make the appropriate MethodHandles and MethodHandle calls.

Duncan talked about how he was able to improve MH chain size and
performance by applying certain transformations in a different order,
among other things. InvokeBinder *could* be doing a lot more to
optimize the MH chain. For example, the above case never uses the
Object value passed in (it is permuted to position 1 and later
dropped), but that fact is obscured by the intervening append.

InvokeBinder is basically doing with MHs what MHs do with LFs. Perhaps
what we really need is a more holistic view of MH + LF operations
*together* so we can boil the whole thing down (even across MH lines)
before we start interpreting or compiling it?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Class hierarchy analysis and CallSites

2014-09-01 Thread Charles Oliver Nutter
On Mon, Sep 1, 2014 at 8:46 AM, MacGregor, Duncan (GE Energy
Management)  wrote:
> Has anybody else tried doing this sort of thing as part of their 
> invokeDynamic Implementation? I’m curious if anybody has data comparing the 
> speed of GWT & class comparison based PICs with checks that require getting a 
> ClassValue and doing a Map or Set lookup?

I've thought about trying this. Here's the short version of JRuby's
class hierarchy + lookup mechanism:

* Each class has a map of methods and a lookup cache. The method map
only contains methods from that class, but the lookup cache (which all
lookups pass through) may eventually hold all methods from
superclasses as well. This is class-level method caching.
* Call sites look up method on the receiver's natural class, which
will populate an entry in the class-level lookup cache if none is
there.
* The lookup cache entries are a tuple of class serial number + method
object. The serial number represents the class's version at lookup
time.
* When any method table changes, the serial numbers of all child
classes get bumped, so their already-cached lookup entries are now
invalid. Upon next lookup, a stale entry will be replaced.

On the call site guard side, we have used both serial number
comparison and class reference comparison. Using the serial number has
a couple: no hard references to classes in the call site, classes that
are identical can be made to look identical by reusing serial number,
etc. The disadvantage is that it requires an additional dereference to
get the current serial number off incoming classes every time. For
class comparison, we dereference the metaclass field on the object to
get a RubyClass object reference, store that at the call site, and do
direct referential comparisons in the call site guard.

I should note that most objects in JRuby are of the same JVM type. We
generally don't stand up a JVM class for every Ruby class, since Ruby
classes are sometimes (frequently?) created and thrown away as part of
normal execution.

This could be considered a sort of CHA, since the serial number
indicates not just the version of the class, but the version of all
its ancestors. It could be improved, however, to be a calculated value
based on the actual shape of the class, so two different classes with
the same superclass and no methods of their own would look the same to
the guard. I have only done basic experiments here.

- Charlie

>
> Duncan.
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Truffle and mlvm

2014-08-30 Thread Charles Oliver Nutter
Removing all context, so it's clear this is just my opinions and thoughts...

As most of you know, we've opened up our codebase and incorporated the
graciously-donated RubyTruffle directly into JRuby. It's available on
JRuby master and we are planning to ship Truffle support with JRuby
9000, our next major version (due out in the next couple months).

At the same time, we have been developing our own next-gen IR-based
compiler, which will run unmodified on any JVM (with or without
invokedynamic, though I still have to implement the "without" side).
Why are we doing this when Truffle shows such promise?

I'll try to enumerate the benefits and problems of Truffle here.

* Benefits of using Truffle

1. Simpler implementation.

>From day 1, the most obvious benefit of Truffle is that you just have
to write an AST interpreter. Anyone who has implemented a programming
language can do this easily. This specific benefit doesn't help us
implement JRuby, since we already have an AST interpreter, but it did
make Chris Seaton's job easier building RubyTruffle initially. This
also means a Truffle-based language is more approachable than one with
a complicated compiler pipeline of its own.

2. Better communication with the JIT.

Truffle, via Graal, has potential to pass much more information on to
the JIT. Things like type shape, escaped references, frame access,
type specialization, and so on can be communicated directly, rather
than hoping and praying they'll be inferred by the shape of bytecodes.
This is probably the largest benefit; much of my time optimizing JRuby
has been spend trying to "trick" C2 into doing the right thing, since
I don't have a direct way to communicate intent.

The peak performance numbers for Truffle-based languages have been
extremely impressive. If it's possible to get those numbers reasonably
quickly and with predictable steady-state behavior in large,
heterogeneous codebases, this is definitely the quickest path (on any
runtime) to a high-performance language implementation.

3. OSS and pure Java

Truffle and Graal are just OpenJDK projects under OpenJDK licenses,
and anyone can build, hack, or distribute them. In addition, both
Truffle and Graal are 100% Java, so for the first time a plain old
Java developer can see (and manipulate) exactly how the JIT works
without getting lost in a sea of plus plus.

* Problems with Truffle

I want to emphasize that regardless of its warts, we love Truffle and
Graal and we see great potential here. But we need a dose of reality
once in a while, too.

1. AST is not enough.

In order to make that AST fly, you can't just implement a dumb generic
interpreter. You need to know about (and generously annotate your AST
for) many advanced compiler optimization techniques:

A. Type specialization plus guarded fallbacks: Truffle will NOT
specialize your code for you. You must provide every specialized path
in your AST nodes as well as annotating "slow path", "transfer to
interpreter", etc.

B. Frame access and reification: In order to have cross-call access to
frames or to squash frames created for multiple inlined calls, you
must use Truffle's representation of a frame. This means loads/stores
within your AST must be done against a Truffle object, not against an
arbitrary object of your own creation.

C. Method invocation and inlining: Up until fairly recently, if you
wanted to inline methods you had to essentially build your own call
site logic, profiling, deopt paths within your Truffle AST. When I did
a little hacking on RubyTruffle around OSS time (December/January) it
did *no* inlining of Ruby-to-Ruby calls. I hacked in inlining using
existing classes and managed to get it to work, but I was doing all
the plumbing myself. I know this has improved in the Truffle codebase
since then, but I have my concerns about production readiness when the
inlining call site parts of Truffle were just recently added and are
still in flux.

And there's plenty of other cases. Building a basic language for
Truffle is pretty easy (I did a micro-language in about two hours at
JVMLS last year), but building a high-performance language for Truffle
still takes a fair investment of effort and working knowledge of
dynamic compiler optimizations.

2. Long startup and warmup times.

As Thomas pointed out in the other thread, because Truffle and Graal
are normally run as plain Java libraries, they can actually aggravate
startup time issues. Now, not only would all of JRuby have to warm up,
but the eventual native code JIT has to warm up too. This is not
surprising, really. It is possible to mitigate this by doing some form
of AOT against Graal, but for every case I have seen the Truffle/Graal
approach makes startup time much, much worse compared to just running
atop JVM.

Warmup time is also worsened significantly.

The AST you create for Truffle must be heavily mutated while running
in order to produce a specialized version of that AST. This must
happen before the AST is eventually fed 

Re: Defining anonymous classes

2014-08-25 Thread Charles Oliver Nutter
On Fri, Aug 15, 2014 at 5:39 PM, John Rose  wrote:
> If the host-class token were changed to a MethodHandles.Lookup object, we 
> could restrict the host-class to be one which the user already had 
> appropriate access to.  Seems simple, but of course the rest of the project 
> is complicated:   API design, spec completion, security analysis, positive 
> and negative test creation, code development, quality assurance—all these 
> would be expensive, and (again) most easily justified in the context of a 
> larger refresh of our classfile format.

Sounds like a good candidate to be a standalone project first. I think
we have the right people on this list to do it (and many of us have
already done large portions of that work on our own already).

> Or, most or all of dAC could be simulated using regular class loading, into a 
> single-use ClassLoader object.  The nominal bytecodes would have to be 
> rewritten to use invokedynamic to manage the linking, at least to host-class 
> names.  But given that ASM is inside the JDK, the tools are all available.  
> (Remi could do most of it in an afternoon. :-) )  Given such a simulation, 
> the internal dAC mechanism could be used as an optimization, when available, 
> but there would be a standard (complex) semantics derived from ordinary 
> classes and indy.

This is how JRuby has survived for years. A classloader-per-class has
a big memory load (ClassLoader has a lot of internal state, classes
have a lot of metadata) but with permgen bumped up (or replaced with
metaspace as in 8) and a few reuse tricks, it hasn't been a major
issue for us. In JRuby, the following are all generated into
single-shot classloaders:

* Ruby methods JIT-compiled to bytecode at runtime
* Wrapper logic ("invokers") around AOT or JIT-compiled method and
closure bodies (including core methods written in Java)
* Synthetic interface implementations and subclasses (for implementing
or extending from Ruby)
* Most other one-off pieces of bytecode we generate at runtime. We
almost never need to look up those classes once created and
instantiated.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: The Great Startup Problem

2014-08-25 Thread Charles Oliver Nutter
On Mon, Aug 25, 2014 at 6:59 AM, Fredrik Öhrström  wrote:
> Calle Wilund and I implemented such a indy/methodhandle solution for
> JRockit, so I know it works. You can see a demonstration here:
> http://medianetwork.oracle.com/video/player/589206011001 That
> implementations jump to C-code that performed the invoke call, no fancy
> optimizations. Though the interpreter implementation of invoke can be
> optimized as well, that was the first half of the talk is about. But its
> really not that important for speed, because the speed comes from inlining
> the invoke call chain as early as possible after detecting that an indy is
> hot.

But can it work in C2? :-)

My impression of C2 is that specialization isn't in the list of things
it does well. If we had a general-purpose specialization mechanism in
Hotspot, things would definitely be a *lot* easier. We might not even
need indy...just write Java code that does all your MH translations
and specialize it to the caller's call site.

We can certainly get C2 to do these things for us...by generating a
crapload of mostly-redundant bytecode. Oh wait...

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: The Great Startup Problem

2014-08-25 Thread Charles Oliver Nutter
On Mon, Aug 25, 2014 at 4:32 AM, Marcus Lagergren
 wrote:
> LambdaForms were most likely introduced as a platform independent way of 
> implementing methodhandle combinators in 8, because the 7 native 
> implementation was not very stable, but it was probably a mistake to add them 
> as “real” classes instead of code snippets that can just be spliced in around 
> the callsite. (I completely lack history here, so flame me if I am wrong)

That's how I remember it, yes. The native impl was not only a bit
unstable...it was a security black hole because of all the
special-casing for method handles in the JIT, and it had serious
issues tracking type information correctly (infamous NCDFE problem).
LFs aren't perfect, but we are way better off now than we were with
that implementation.

I do remember a conversation I had with Chris Thalinger about how it
seemed wrong that method handles were treated as a middle grey area
between call site and target, potentially not inlining in either
direction. My suggestion was to treat all handles bound into a call
site as though they were simply added bytecode in the surrounding
method...essentially, force inline non-direct handles into the caller
immediately (for some definition of "immediately") and let the only
remaining decision be which DMHs to inline as well. It worked ok for
simple cases we tried, but there were some places it didn't work well.
I don't remember the details.

We also did a rough equivalent to indy for JRuby's dispatch, but
supported on any JVM:

* All Ruby-callable methods have unique generated invoker class for
arities 0-3,N. These invokers contained all argument adaptation, heap
frame management, etc...just like a force-compiled MH chain.
* Each call site gets a synthetic method body that does lookup,
caching, and dispatch. Dispatch passes directly into those 0-3,N call
paths, and for matching arities it should inline straight through (the
invokers implement all direct-path arities as direct calls to the
appropriate code).

These generated call site methods were only monomorphic, but this
setup gave us fully inlinable dynamic dispatches without indy. It
worked well if we bumped up inlining thresholds (this was
pre-incremental JIT) but we shelved it at the time. However, I'm
probably going to explore this path again to get near-indy speeds on
non-indy JVMs for the new IR-based JIT.

Put a bit more directly: I can generate a load of bytecode to get
indy-like behavior with or without indy too. The gulf between the
current indy implementation and my way – explicitly generating code
where and when I need it – is LambdaForm interpretation and
translation.

> For 9, it seems that we need a way to implement an indy that doesn’t result 
> in class generation and installation of anonymous runtime classes. Note that 
> _class installation_ as such is also a large overhead in the JVM - even more 
> so when we regenerate more code to get more optimal types. I think we need to 
> move from separate classes to inlined code, or something that requires 
> minimium bookkeeping. I think this may be subject to profile pollution as 
> well, but I haven’t been able to get my head around the ramifications yet.

I am going to play with the property Jochen mentioned, which forces
LFs to JIT much sooner. I feel like we're almost where we need to be,
but it feels like LFs need to be more directly represented as IR in C2
rather than going through this foggy middle ground of JVM bytecode.
*I* can do foggy JVM bytecode...indy should be doing a lot better than
that.

Hell, should MethodHandle be backed by Graal IR instead of LFs? It
would still be interpretable, but when we go to JIT the chain we're
losing a lot less in translation, and we can do site or
target-specific specialization at that point.

I always saw MHs as a general-purpose call site IR. Maybe we should
make good on that.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: The Great Startup Problem

2014-08-24 Thread Charles Oliver Nutter
On Sun, Aug 24, 2014 at 12:55 PM, Jochen Theodorou  wrote:
> afaik you can set how many times a lambda form has to be executed before it
> is compiled... what happens if you set that very low... like 1 and disable
> tiered compilation?

Forcing all handles to compiler early has the same negative
effect...most are only called once, and the overhead of reifying them
outweighs the cost of interpreting them.

I need to play with it more, though. The property I think you're
referring to did not appear to help us much.

>> We obviously still love working with OpenJDK, and it remains the best
>> platform for building JRuby (and other languages). However, our
>> failure as a community to address these startup/warmup issues is
>> eventually going to kill us. Startup time remains the #1 complaint
>> about JRuby, and warmup time may be a close second.
>
>
> how do normal ruby startup times compare to JRuby for a rails app?

Perhaps 10x faster startup across the board in C Ruby. With tier 1 we
can get it down to 5x or so. It's incredibly frustrating for our
users.

> All in all, the situation is for the Groovy world quite different I would
> say.

I'd guess that developers in the Groovy world typically do all their
development in an IDE, which can keep a runtime environment available
all the time. Contrast this to pretty much everyone not from a Java or
C# background, where their IDE is a text editor and a command line.

You're also right that it's not quite a fair comparison. Rails is 100%
Ruby. Perhaps 50-75% of the libraries in a given app are 100% Ruby.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: The Great Startup Problem

2014-08-24 Thread Charles Oliver Nutter
On Sun, Aug 24, 2014 at 12:02 PM, Per Bothner  wrote:
> On 08/24/2014 03:46 AM, Marcus Lagergren wrote:
>> This is mostly invokedynamic related. Basically, an indy callsite
>> requires a lot of implicit class and byte code generation, that is
>> the source of the overhead we are mostly discussing. While tiered
>> compilation adds non determinism, it is usually (IMHO) bearable…

Indy aggravates the situation...it's easily an order of magnitude more
overhead at boot time.

I am also talking about startup time without indy, however. I'll try
to be more specific about our boot time overhead later in this reply.

> (1) Kawa shows you can have dynamic languages on the JVM that both
> run fast and have fast start-up.

Like Clojure, I'd only consider Kawa to be *somewhat* dynamic. Most
function calls can be statically dispatched, no? I think it's a poor
comparison to languages that have fully dynamic method lookup at all
(or most) sites.

> (2) Other dynamic languages (Ruby, JavaScript, PHP) have had more problems,
> possibly because they are "too dynamic".  Or perhaps just their kind of
> "dynamicism" is a poor match for the JVM.

They're not "too dynamic"...they're "pervasively dynamic". But this is
a red herring...I don't believe Ruby's dynamism is the source of our
startup time issues.

> (3) "Too dynamic" does not inherently mean a flaw in either the JVM *or*
> the language, just a mis-match.  (Though I'm of the school that believes
> "more staticness" is better for programmer productivity and software quality
> - as well as performance.  Finding the right tradeoff is hard.)

I believe development tasks will require not just a balance of
dynamism and staticism, but a range of languages along that spectrum.
There is no one true language, and no one true balance between dynamic
and static.

I think this is birdwalking away from the original problem, though.

> (4) Invokedynamic was a noble experiment to alleviate (2), but so far it
> does not seem to have solved the problems.

Conceptually, invokedynamic has proven itself incredibly capable. In
reality, the implementation has been harder and taken longer than we
expected. We're also butting up against a JVM that has been optimized
around Java for years...it's hard to teach that old dog new tricks.

> (5) It is reasonable to continue to seek improvements in invokedynamic,
> but in terms of resource prioritization other enhancement in the Java
> platform
> (value types, tagged values, reified generics, continuations, removing class
> size
> limitations, etc etc) are more valuable.

Many of which will probably use invokedynamic in some form under the
covers. Getting invokedynamic solid, fast, and predictable should be
priority one for JVM hackers right now.

> (6) That of course does not preclude an "aha": If we made modest change xyz,
> that could be a big help.  I just don't think Oracle or the community should
> spend too much time on "fixing" invokedynamic.

I disagree wholeheartedly! Invokedynamic is by far the best tool we
have going forward to extend the JVM and languages that run atop it.
It's going through growing pains, though.

I wanted to describe JRuby's boot time, so people don't think this is
a problem of a "too dynamic" language, or solely an invokedynamic
issue.

As with any other JVM languages, JRuby is almost entirely written in
Java. So our entire runtime needs to warm up before we get decent
performance. This includes:

* A very complicated parser. Ruby's grammar has been designed to
accommodate programmers rather than parsers, and it has thousands of
productions and state transitions. Note that all Ruby applications
boot from source every time they start up.

* An AST-based interpreter (JRuby 1.7). The AST nodes call each other,
and nested nodes deepen the stack. This is not as efficient,
memory-wise, as a flat instruction-based interpreter (IR in JRuby
9000), but it has excellent inlining characteristics. A CallNode
typically will call an ArgsNode to process args, a BlockArg node to
process captured closures, etc. So the AST kinda-sorta trace JITs on
the small. It's worth nothing that JRuby's AST interpreter, once warm,
is much faster at running Ruby code than cold, compiled Ruby (JVM
bytecode) in the JVM interpreter.

* A traditional CFG-based IR compiler (JRuby 9000). We have been
working to reduce the overhead of the new compiler, since it is
additional overhead compared to JRuby 1.7. We're getting there.

* An IR-based interpreter (JRuby 9000). The IR interpreter uses one
large frame for the interpreter and small frames for instruction
bodies. We have been working to manually inline just enough logic to
make the IR interpreter of similar or less overhead compared to the
AST interpreter. This may involve the introduction of
superinstructions, or we may get things "good enough" and rely on the
JVM bytecode JIT to take us the rest of the way.

* A JVM bytecode compiler, from either AST or IR. The latter is much
simpler, but this is still an

The Great Startup Problem

2014-08-22 Thread Charles Oliver Nutter
Marcus coaxed me into making a post about our indy issues. Our indy
issues mostly surround startup and warmup time, so I'm making this a
general post about startup and warmup.

When I started working on JRuby 7 years ago, I hoped we'd have a good
answer for poor startup time and long warmup times. Today, the answers
are no better -- and in many cases much worse -- than when I started.

Here's a summary of our experience over the years...

* client versus server

Early on, we made JRuby's launcher use client mode by default. This
was by far the best way to get good startup performance, but it led to
us perpetuating the old question "which mode are you running in" when
people reported poor steady-state performance.

* Tiered compiler

The promise of the tiered compiler was great: client-fast startup with
server-fast steady state. In practice, tiered has failed to meet
expectations for us. The situation is aggravated by the loss of
-client and -server flags.

On the startup side, we have found that the tiered compiler never even
comes close to the startup time of -client. For a nontrivial app
startup, like a Rails app, we see a 50% reduction in startup time by
forcing tier 1 (which is C1, the old -client mode) rather than letting
the tiered compiler work normally.

Obviously limiting ourselves to tier 1 means performance is reduced,
but these days our #1 user complain is startup time. So, we have AGAIN
taken the step of putting startup-improving flags into our launchers:
jruby --dev forces tier 1 + client mode.

On the steady-state side, the tiered compiler is rather unpredictable.
Some cases will be faster (presumably from better profiling in earlier
tiers), while others will be much slower. And it can vary from run to
run...tiered steady-state performance is even harder to predict than
C2 (-server). We have done no investigation here.

* Invokedynamic

We love indy. We love it more than just about anyone. But we have
again had to make indy support OFF by default in JRuby 1.7.14 and may
have to do the same for JRuby 9000.

Originally, we had indy off because of the NCDFE bugs in the old
implementation. LambdaForms have fixed all that, and with JIT
improvements in the past year they generally (eventually) reach the
same steady-state performance.

Unfortunately, LambdaForms have an enormous startup-time cost. I
believe there's two reasons for this:

1. Method handle chains can now result in dozens of lambda forms,
making the initial bootstrapping cost much higher. Multiply this by
thousands of call sites, all getting hit for the first time. Multiply
that by PIC depth. And then remember that many boot-time operations
will blow out those caches, so you'll start over repeatedly. Some of
this can be mitigated in JRuby, but much of it cannot.

2. Lambda forms are too slow to execute and take too long to optimize
down to native code. Lambda forms work sorta like the tiered compiler.
They'll be interpreted for a while, then they'll become JVM bytecode
for a while, which interprets for a while, then the tiered compiler's
first phase will pick it up There's no way to "commit" a lambda
form you know you're going to be hitting hard, so it takes FOREVER to
get from a newly-bootstrapped call site to the 5 assembly instructions
that *actually* need to run.

I do want to emphasize that for us, LambdaForms usually do get to the
same peak performance we saw with the old implementation. It's just
taking way, way too long to get there.

Because of these issues, JRuby's new --dev flag turns invokedynamic
off, and JRuby 1.7.14 will once again tuen indy off by default on all
JVM versions.

* Other ways of mitigating startup time

We have recommended Nailgun in the past. Nailgun keeps a JVM running
in the background, and you toss it commands to run. It works well as
long as the commands are actually self-contained, self-cleaning units
of work; spin up one thread or leave resources open, and the Nailgun
server eventually becomes unusable.

We now recommend Drip as a similar solution. For each command you run,
Drip attempts to start additional larval JVMs in the background in
preparation for future commands. You can configure those instances to
pre-boot libraries or application resources, to reduce the work done
at startup for the next command (e.g. preboot your Rails application,
and then the next command just has to utilize it). Drip is cleaner
than Nailgun, but never quite achieves the same startup time without a
lot of configuration. It is also a bit of a hack...you can easily
preboot something in the "next JVM" that is out of date by the time
you use it.

CONCLUSION...

We obviously still love working with OpenJDK, and it remains the best
platform for building JRuby (and other languages). However, our
failure as a community to address these startup/warmup issues is
eventually going to kill us. Startup time remains the #1 complaint
about JRuby, and warmup time may be a close second.

What are the rest of you doing to deal with

Re: How high are he memory costs of polymorphic inline caches?

2014-08-18 Thread Charles Oliver Nutter
Hello, fellow implementer :-)

On Mon, Aug 18, 2014 at 6:01 AM, Raffaello Giulietti
 wrote:
> So, the question is whether some of you has experience with large scale
> projects written in a dynamic language implemented on the JVM, that makes
> heavy use of indy and PICs. I'm curious about the memory load for the PICs.
> I'm also interested whether the standard Oracle server JVM satisfactorily
> keeps up with the load.

JRuby has implemented call sites using this sort of PIC structure
since the beginning. I dare say we were the first.

Experimentally, I determined that the cost of a PIC became greater
than a non-indy, non-inlining monomorphic cache at about 5 deep. By
cost, I mean the overhead involved in dispatching fromthe caller to an
empty methodessentially just the cost of the plumbing.

Now of course that number's going to vary, but overall a small PIC
seems to have value...especially when you consider that the cost of
rebinding a call site is rather high.

We do have JRuby users using our indy logic in production, and none of
their concerns have had any relation to the PIC (usually, it's just
startup/warmup concerns).

> For example, we have a large Smalltalk application with about 50'000 classes
> and about 600'000 methods. In Smalltalk, almost everything in code is a
> method invocation, including operators like +, <=, etc. I estimate some 5-10
> millions method invocation sites. How many of them are active during a
> typical execution, I couldn't tell. But if the Smalltalk runtime were
> implemented on the JVM, PICs would quite certainly represent a formidable
> share of the memory footprint.

That's why you limit their size. If an inline cache behind a GWT takes
up N bytes in memory, a PIC based on the same invalidation logic
should just be N * X where X is the depth of your PIC.

Of course, you could do what we do in JRuby, and make the PIC depth
configurable to try out a few things.

> More generally, apart from toy examples, are there studies in real-world
> usage of indy and PICs in large applications?
> Perhaps some figures from the JRuby folks, or better, their users'
> applications would be interesting.

The only "studies" we have are the handful of JRuby users running with
indy enabled in production. They love the higher performance, hate the
startup/warmup time, and most of them had to either bump permgen up or
switch to Java 8 to handle the extra code generation happening in
indy.

I'm happy to answer any other questions.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Loopy CallSite

2014-07-12 Thread Charles Oliver Nutter
I played with this some years ago. Doesn't it just become recursive,
because it won't inline through the dynamicInvoker?

- Charlie (mobile)
On Jul 12, 2014 9:36 AM, "Remi Forax"  wrote:

> It seems that the JIT is lost with whe there is a loopy callsite and never
> stabilize (or the steady state is after the program ends).
>
> import java.lang.invoke.MethodHandle;
> import java.lang.invoke.MethodHandles;
> import java.lang.invoke.MethodType;
> import java.lang.invoke.MutableCallSite;
>
> public class Loop {
>   static class LoopyCS extends MutableCallSite {
> public LoopyCS() {
>   super(MethodType.methodType(void.class, int.class));
>
>   MethodHandle target = dynamicInvoker();
>   target = MethodHandles.filterArguments(target, 0, FOO);
>   target = MethodHandles.guardWithTest(ZERO,
>   target,
> MethodHandles.dropArguments(MethodHandles.constant(int.class,
> 0).asType(MethodType.methodType(void.class)), 0, int.class));
>   setTarget(target);
> }
>   }
>
>   static final MethodHandle FOO, ZERO;
>   static {
> try {
>   FOO = MethodHandles.lookup().findStatic(Loop.class, "foo",
> MethodType.methodType(int.class, int.class));
>   ZERO = MethodHandles.lookup().findStatic(Loop.class, "zero",
> MethodType.methodType(boolean.class, int.class));
> } catch (NoSuchMethodException | IllegalAccessException e) {
>   throw new AssertionError(e);
> }
>   }
>
>   private static boolean zero(int i) {
> return i != 0;
>   }
>
>   private static int foo(int i) {
> COUNTER++;
> return i - 1;
>   }
>
>   private static int COUNTER = 0;
>
>   public static void main(String[] args) throws Throwable {
> for(int i=0; i<100_000; i++) {
>   new LoopyCS().getTarget().invokeExact(1_000);
> }
> System.out.println(COUNTER);
>   }
> }
>
> cheers,
> Rémi
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


FORK

2014-04-24 Thread Charles Oliver Nutter
What would it take to make Hotspot forkable? Obviously we'd need to
pause all VM threads and restarting them on the other side (or perhaps
a prefork mode that doesn't spin up threads?) but I know there's
challenges with signal handlers etc.

I ask for a few reasons...

* Dalvik has shown what you can do with a "larval" preforking setup.
This is a big reason why Android apps can run in such a small amount
of memory and start up so quickly.
* Startup time! If we could fork an already-hot JVM, we could hit the
ground running with *every* command, *and* still have truly separate
processes.
* There's a lot of development and scaling patterns that depend on
forking, and we get constant questions about forking on JRuby.
* Rubinius -- a Ruby VM with partially-concurrent GC, a
signal-handling thread, JIT threads, and real parallel Ruby threads --
supports forking. They bring the threads to a safe point, fork, and
restart them on the other side. Color me jealous.

So...given that OpenJDK is rapidly expanding into smaller-profile
devices and new languages and development patterns, perhaps it's time
to make it fit into the UNIX philosophy. Where do we start?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Number of Apps per JVM

2014-01-12 Thread Charles Oliver Nutter
I think some of these requirements are at cross purposes. For example,
how can you have thread-safe object access but still be able to freely
pass objects across "processes"? How can you freely pass objects
across in-process VMs but still enforce memory red lines? The better
isolation you get between processes/VMs, the more overhead you impose
on communication between them.

I have to admit I don't know how Kilim does its object isolation.

- Charlie

On Sun, Jan 12, 2014 at 7:12 PM, Mark Roos  wrote:
> Thanks for the suggestion on Waratek,  not sure how it would address the
> process to process
> messaging issue.  It did lead me to another very interesting read though,
> http://osv.io.  Again
> not an answer for the messaging but something that I have always thought
> would be interesting to
> try,  a stripped down jvm+os.  Perhaps JavaOS -).
>
> thx
> mark
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: RFC: JDK-8031043: ClassValue's backing map should have a smaller initial size

2014-01-11 Thread Charles Oliver Nutter
On Thu, Jan 9, 2014 at 7:47 PM, Christian Thalinger
 wrote:
>
> On Jan 9, 2014, at 5:25 PM, Charles Oliver Nutter  wrote:
>
>> runtime. Generally, this does not exceed a few dozen JRuby instances
>> for an individual app, and most folks don't deploy more than a few
>> apps in a given JVM.
>
> Interesting.  Thanks for the information.

I forgot to mention: more and more users are going with exactly one
JRuby runtime per app, and most Ruby folks deploy one app in a given
JVM. So the number of values attached to a class is trending toward 1.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: RFC: JDK-8031043: ClassValue's backing map should have a smaller initial size

2014-01-09 Thread Charles Oliver Nutter
It depends how JRuby is deployed. If the same code runs in every JRuby
runtime, then there would be one value attached to a given class per
runtime. Generally, this does not exceed a few dozen JRuby instances
for an individual app, and most folks don't deploy more than a few
apps in a given JVM.

- Charlie

On Thu, Jan 9, 2014 at 1:16 PM, Christian Thalinger
 wrote:
>
> On Jan 9, 2014, at 2:46 AM, Jochen Theodorou  wrote:
>
>> Am 08.01.2014 21:45, schrieb Christian Thalinger:
>> [...]
>>> If we’d go with an initial value of 1 would it be a performance problem for 
>>> you if it grows automatically?
>>
>> that means the map will have to grow for hundreds of classes at startup.
>> I don't know how much impact that will have
>
> If it’s only hundreds it’s probably negligible.  You could do a simple 
> experiment if you are worried:  change ClassValueMap.INITIAL_ENTRIES to 1, 
> compile it and prepend it to the bootclasspath.
>
>>
>> bye Jochen
>>
>> --
>> Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
>> blog: http://blackdragsview.blogspot.com/
>> german groovy discussion newsgroup: de.comp.lang.misc
>> For Groovy programming sources visit http://groovy-lang.org
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: JVM Language Summit (Europe)?

2013-10-09 Thread Charles Oliver Nutter
Not bad... I think I could manage that.

- Charlie

On Sat, Oct 5, 2013 at 8:23 AM, Ben Evans  wrote:
> How about a 1.5 day conference on Thursday 30th & morning of Friday 31st
> January?
>
> Then people who are coming to Europe for FOSDEM can arrive in London on
> Weds, have the language summit for 1.5 days & we can all get the Eurostar to
> Brussels together to arrive in time for the Delerium cafe?
>
> Thanks,
>
> Ben
>
>
> On Sat, Oct 5, 2013 at 11:09 AM, Martijn Verburg 
> wrote:
>>
>> Hi all,
>>
>> Great - I think that's enough positive responses + the ones I got on
>> twitter :-).  Ben and I will put our thinking caps on and see if we can put
>> something very close to FOSDEM so that folks can just Eurostar across (it's
>> the only way to travel ;p).
>>
>> We'll try to grab some sponsorship etc, but I'll warn people that for now
>> they should expect to pay their own way for travel and accommodation.
>>
>> Will post here again when we have some more concrete plans!
>>
>>
>> Cheers,
>> Martijn
>>
>>
>> On 5 October 2013 07:30, Cédric Champeau 
>> wrote:
>>>
>>> Hi!
>>>
>>> I am interested too, and I'd vote for an "opposite" summit.
>>>
>>> Cédric
>>>
>>>
>>> 2013/10/2 Martijn Verburg 

 Hi all,

 Hope this is the right mailing list to post on, apologies for the slight
 OT post.

 A few people asked whether the LJC could/would host a JVM language
 summit in Europe which would hopefully cover the EMEA based folks that 
 can't
 make the existing summit.

 I'd like to get an idea of whether there's appetite for this and if so
 when it should be run:

 * At the same time and have some video-conferencing sessions? OR
 * At a time almost 'opposite' to the existing summit so that there's a
 summit roughly every 6-months.

 Ping me directly with your thoughts (unless this is the right mailing
 list - in that case reply back here).

 Cheers,
 Martijn

 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

>>>
>>>
>>> ___
>>> mlvm-dev mailing list
>>> mlvm-dev@openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>
>>
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: JVM Language Summit (Europe)?

2013-10-04 Thread Charles Oliver Nutter
On Oct 4, 2013 5:50 PM, "George Marrows" 
wrote:
> I'd suggest 'opposite' to the existing summit, so that we might get some
of the key figures from that conf (Brian Goetz, John Rose, Charlie Nutter
etc) over in Europe. Charlie has certainly said he'd be interested in
coming to one in Europe, particularly if it preceded/followed another
European conf he would like to attend.

Absolutely!

The FOSDEM Java room has kinda served that purpose but it has a broader
focus. I would definitely like an official event.

> On Wed, Oct 2, 2013 at 1:38 PM, Martijn Verburg 
wrote:
>>
>> Hope this is the right mailing list to post on, apologies for the slight
OT post.

This is a pretty good list. Also JVM-L (I will forward).

>> * At the same time and have some video-conferencing sessions? OR
>> * At a time almost 'opposite' to the existing summit so that there's a
summit roughly every 6-months.

Opposite for sure. I hate to stack another event on the FOSDEM+Jfokus
schedule but that would maybe maximize the number of US folks that would be
there.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Interpreting Mission Control numbers for indy

2013-09-18 Thread Charles Oliver Nutter
A bit more on performance numbers for this application.

With no indy, monomorphic caches...the full application (a data load)
runs in about a minute. I fully recognize that this is a short run,
but JMC seems to indicate the bulk of code has compiled well before
the halfway point.

With 7u40 or 8, no tiered compilation, it takes about two minutes.

Tiered reduces non-indy time to 51s and indy time to 1m29s

Tiered + indy + only using monomorphic cache (no direct binding) runs
in 1m, still 9s slower than non-indy.

With normal settings, indy call sites do settle down and are mostly
monomorphic For the two phases of the data load, I stop seeing JRuby
bind indy call sites a couple seconds in.

There does not appear to be any difference in performance on this app
between 7u40 and 8b103.

Like I say...I think the user would be willing to share the
application, and I feel like the numbers warrant investigation.
Standing by! :-)

- Charlie

On Wed, Sep 18, 2013 at 10:39 AM, Charles Oliver Nutter
 wrote:
> I've been playing with JMC a bit tonight, running a user's application
> that's about 2x slower using indy than using trivial monomorphic
> caches (and no indy call sites). I'm trying to understand how to
> interpret what I see.
>
> In the Code/Overview results, where it lists "hot packages", the #1
> and #2 packages are java.lang.invoke.LambdaForm$MH and DMH, accounting
> for over 37% of samples. That sounds high, but I'm willing to grant
> they're hit pretty hard for a fully dynamic application.
>
> Results in the "Hot Methods" tab show similar things, like
> LambdaForm...invokeStatic_LL_L as the number one result and LambdaForm
> entries dominating the top 50 entries in the profile. Again, I know
> I'm hitting dynamic call sites hard and sampling is not always
> accurate.
>
> If I look at compilation events, I only see a handful of
> LambdaForm...convert being compiled. I'm not sure if that's good or
> bad. My assumption is that LFs don't show up here because they're
> always being inlined into a caller.
>
> The performance numbers for the app have me worried too. If I run
> JRuby with stock settings, we will chain up to 6 call targets at a
> call site. The lower I drop this number, the better performance gets;
> when I drop all the way to zero, forcing all invokedynamic call sites
> to fail over immediately to a monomorphic inline cache, performance
> *almost* gets back to the non-indy implementation. This leads me to
> believe that the less I use invokedynamic (or the fewer LFs involved),
> the better. That doesn't bode well.
>
> I believe the user would be happy to allow me to make these JMC
> recordings available, and I'm happy to re-run with additional events
> or gather other information. The JRuby community has a number of very
> large applications that push the limits of indy. We should work
> together to improve it.
>
> - Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Interpreting Mission Control numbers for indy

2013-09-18 Thread Charles Oliver Nutter
I've been playing with JMC a bit tonight, running a user's application
that's about 2x slower using indy than using trivial monomorphic
caches (and no indy call sites). I'm trying to understand how to
interpret what I see.

In the Code/Overview results, where it lists "hot packages", the #1
and #2 packages are java.lang.invoke.LambdaForm$MH and DMH, accounting
for over 37% of samples. That sounds high, but I'm willing to grant
they're hit pretty hard for a fully dynamic application.

Results in the "Hot Methods" tab show similar things, like
LambdaForm...invokeStatic_LL_L as the number one result and LambdaForm
entries dominating the top 50 entries in the profile. Again, I know
I'm hitting dynamic call sites hard and sampling is not always
accurate.

If I look at compilation events, I only see a handful of
LambdaForm...convert being compiled. I'm not sure if that's good or
bad. My assumption is that LFs don't show up here because they're
always being inlined into a caller.

The performance numbers for the app have me worried too. If I run
JRuby with stock settings, we will chain up to 6 call targets at a
call site. The lower I drop this number, the better performance gets;
when I drop all the way to zero, forcing all invokedynamic call sites
to fail over immediately to a monomorphic inline cache, performance
*almost* gets back to the non-indy implementation. This leads me to
believe that the less I use invokedynamic (or the fewer LFs involved),
the better. That doesn't bode well.

I believe the user would be happy to allow me to make these JMC
recordings available, and I'm happy to re-run with additional events
or gather other information. The JRuby community has a number of very
large applications that push the limits of indy. We should work
together to improve it.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Reproducible InternalError in lambda stuff

2013-09-16 Thread Charles Oliver Nutter
On Mon, Sep 16, 2013 at 2:36 AM, John Rose  wrote:
> I have refreshed mlvm-dev and pushed some patches to it which may address
> this problem.

I'll get a build put together and see if I can get users to test it.

> If you have time, please give them a try.  Do "hg qgoto meth-lfc.patch".
>
> If this stuff helps we would like to work towards a fix in 7u.
>
> What is your time frame for JRuby 1.7.5?

It is on hold indefinitely while we work out user-reported issues
(most are not 7u40-related, but we'd like to have an answer for those
before release too).

I've attached one user's hs_err dump. This was with a 4GB heap. Code
cache full and failing spectacularly?

- Charlie


hs_err_pid1184.log
Description: Binary data
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Reproducible InternalError in lambda stuff

2013-09-14 Thread Charles Oliver Nutter
We are getting many reports of memory issues under u40 running appear with
indy support. Some seem to go away with bigger heaps, but others are still
eventually failing. This is a very high priority for us because we had
hoped to release JRuby 1.7.5 with indy enabled (finally) and that may not
be possible.
On Sep 14, 2013 3:07 PM, "David Chase"  wrote:

> I am not sure, but it seemed like "something" bad floated into jdk8 for a
> little while, and then floated back out again.
> I haven't kept close enough track of the gc-dev mailing list, but for a
> few days I was frequently running out of memory when I had not been before
> (i.e., doing a build, or simply initializing some of the internal tests) --
> this on a machine where when I checked, at least 4G was free for the taking.
>
> Something happened, and the problems went away.
>
> On 2013-09-13, at 6:59 PM, Charles Oliver Nutter 
> wrote:
>
> > On Sat, Sep 14, 2013 at 12:57 AM, Charles Oliver Nutter
> >  wrote:
> >> * More memory required when running with indy versus without, all
> >> other things kept constant (reproduced by two people, one of them me)
> >
> > I should say *significantly more* memory here. The app Alex was
> > running had to go from 1GB heap / 256MB permgen to 2G/512M when it was
> > running *fine* before...and this is just for running the *tests*.
> >
> > - Charlie
> > ___
> > mlvm-dev mailing list
> > mlvm-dev@openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Reproducible InternalError in lambda stuff

2013-09-13 Thread Charles Oliver Nutter
On Sat, Sep 14, 2013 at 12:57 AM, Charles Oliver Nutter
 wrote:
> * More memory required when running with indy versus without, all
> other things kept constant (reproduced by two people, one of them me)

I should say *significantly more* memory here. The app Alex was
running had to go from 1GB heap / 256MB permgen to 2G/512M when it was
running *fine* before...and this is just for running the *tests*.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Reproducible InternalError in lambda stuff

2013-09-13 Thread Charles Oliver Nutter
I do not...but it appears to be tied to getting an OOM when inside lambda code.

We now have a third-party report of the same issue. Because the
internal error appears to nuke the original exception, we don't know
for sure that this is memory-related, but the user did see *other*
threads raise OOM and increasing memory solved it.

https://github.com/jruby/jruby/issues/1014

So...there's two things that are bad things here...

* More memory required when running with indy versus without, all
other things kept constant (reproduced by two people, one of them me)
* InternalError bubbling out and swallowing the cause (reproduced by
the same two people)...this may count as two issues.

My original reproduction did not appear to fire on Java 8, but it also
appeared to run forever...so it's possible that we were at a specific
memory threshold (permgen? normal heap? meatspace?) or Java 8 may be
failing more gracefully.

Feel free to discuss or offer suggestions to Alex on the bug report
above. I will be monitoring.

- Charlie

On Mon, Sep 9, 2013 at 6:21 PM, Christian Thalinger
 wrote:
>
> On Sep 6, 2013, at 11:11 PM, Charles Oliver Nutter  
> wrote:
>
>> I can reproduce this by running a fairly normalish command in JRuby:
>>
>> (Java::JavaLang::InternalError)
>>guard=Lambda(a0:L,a1:L,a2:L,a3:L,a4:L)=>{
>>
>> t5:I=MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)boolean(a1:L,a2:L,a3:L,a4:L);
>>
>> t6:L=MethodHandleImpl.selectAlternative(t5:I,(MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)IRubyObject),(MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)IRubyObject));
>>t7:L=MethodHandle.invokeBasic(t6:L,a1:L,a2:L,a3:L,a4:L);t7:L}
>>
>> I think it's happening at an OutOfMemory event (bumping up memory
>> makes it go away), so it may not be a critical issue, but I thought
>> I'd toss it out here.
>
> Do know where it's coming from?  -- Chris
>
>>
>> - Charlie
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Reproducible InternalError in lambda stuff

2013-09-06 Thread Charles Oliver Nutter
I can reproduce this by running a fairly normalish command in JRuby:

(Java::JavaLang::InternalError)
guard=Lambda(a0:L,a1:L,a2:L,a3:L,a4:L)=>{

t5:I=MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)boolean(a1:L,a2:L,a3:L,a4:L);

t6:L=MethodHandleImpl.selectAlternative(t5:I,(MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)IRubyObject),(MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)IRubyObject));
t7:L=MethodHandle.invokeBasic(t6:L,a1:L,a2:L,a3:L,a4:L);t7:L}

I think it's happening at an OutOfMemory event (bumping up memory
makes it go away), so it may not be a critical issue, but I thought
I'd toss it out here.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Classes on the stack trace (was: getElementClass/StackTraceElement, was: @CallerSensitive public API, was: sun.reflect.Reflection.getCallerClass)

2013-07-30 Thread Charles Oliver Nutter
On Tue, Jul 30, 2013 at 7:17 AM, Peter Levart  wrote:
> For outside JDK use, I think there are two main needs, which are actually
> distinct:
>
> a) the caller-sensitive methods
> b) anything else that is not caller-sensitive, but wants to fiddle with the
> call-stack
>
> For caller-sensitive methods, the approach taken with new
> Reflection.getCallerClass() is the right one, I think. There's no need to
> support a fragile API when caller-sensitivity is concerned, so the lack of
> "int" parameter, combined with annotation for marking such methods is
> correct approach, I think. The refactorings to support this change in JDK
> show that this API is adequate. The "surface" public API methods must
> capture the caller class and pass it down the internal API where it can be
> used.

This is largely what I advocated and what we do in JRuby.

First of all, we've never made any guarantees about calls to
caller-sensitive methods like Class.forName. If issues were reported,
our answer was to pass in a classloader, just as you would have to do
if you had a utility library between your user code and a
Class.forName call.

Second, the presence of a hidden API to walk the stack is not an
excuse for using it and then complaining when it is taken away. Yes
yes, Unsafe falls into this category too, but in the case of Unsafe
there's no alternative. With getCallerClass, there is an alternative:
pass down the caller class. This may not be attractive, especially
given the magic provided by getCallerClass before...but it is a
solution.

Third, for language runtimes like Groovy, it seems to me that only
*effort* is required to handle the passing down of the caller class.
If we look at the facts, we see that getCallerClass is needed to skip
intermediate frames by the runtime. So the runtime knows about these
intermediate frames and knows how many to skip. This means that the
original call is not into an uncontrolled library, but instead is into
Groovy-controlled code. Passing down the caller object or class at
that point is obviously possible. Even if the hassle of passing a new
additional parameter through the call protocol is too difficult, a
ThreadLocal could be utilized for this purpose.

For the logging frameworks, I do not have a solution other than the
same one we recommend to JRuby users: pass in the class or
classloader. I could also suggest generating a backtrace and walking
back to the appropriate element, but backtrace generation is currently
far too expensive to use in heavily-hit logging code (this should be
improved).

I will also say that I agree an official stack-walking capability
would be incredibly useful, and not just for this case. But that's a
much bigger fish to fry and it won't happen in JDK8 timeframe.

> Now that is the question for mlvm-dev mailing list: Isn't preventing almost
> all Lookup objects obtained by Lookup.in(RequestedLookupClass.class) from
> obtaining MHs of @CallerSensitive methods too restrictive?

Probably. It seems to me that @CallerSensitive is no different from
exposing private methods or fields through a MH. Perhaps it should
recalculate caller when it's called, perhaps it should calculate it at
lookup time, but not being retrievable at all seems like overkill. I
will grant that overkill was probably the quickest and safest solution
at the time.

> I would point out that this could all easily be solved simply by adding a
> getElementClass() method to StackTraceElement, but there was strong
> opposition to this, largely due to serialization issues. Since that is
> apparently not an option, I propose the following API, based on the various
> discussions in the last two months, StackTraceElement, and the API that .NET
> provides to achieve the same needs as listed above:

A new stack trace getter that provides classes would be an immense
improvement, but only if it did not have the same overhead as current
stack trace generation. Again, that needs to be fixed.

> Furthermore, I propose that we restore the behavior of
> sun.reflect.Reflection#getCallerClass(int) /just for Java 7/ since the
> proposed above solution cannot be added to Java 7.

Probably for 7 and 8. I'm pessimistic about its use, but the timeframe
for moving away from it is too short.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: sun.reflect.Reflection.getCallerClass(int) is going to be removed... how to replace?

2013-07-10 Thread Charles Oliver Nutter
On Wed, Jul 10, 2013 at 4:30 AM, Cédric Champeau
 wrote:
> I must second Jochen here. That getCallerClass doesn't work anymore in
> an update release is unacceptable to me. As Jochen explained, there's no
> suitable replacement so far. We can live with getCallerClass
> disappearing if there's a replacement, but obviously, the
> @CallerSensitive "solution" is not one for us. There are additional
> frames in our runtime. Also we need to support multiple JDKs (5 to 8,
> but 8 is already broken). Especially, we don't have any replacement for
> @Grab which makes use of it internally. Furthermore, I suspect
> Class.forName and ResourceBundle.getBundle are widespread in user code
> and it used to work. This is not the kind of stuff that people expect to
> break when upgrading a JDK, and we can't tell them to rewrite their code
> (especially, finding the right classloader might involve more serious
> refactoring if it needs to be passed as a method argument).

Another alternative we are not using in JRuby...but we could.

In JRuby, we pass the calling object into every call site to check
Ruby-style visibility at lookup time (we can't statically determine
visibility, and we do honor it). That gets you a bit closer to being
able to get the caller's classloader without stack tricks (though I
admit it does nothing for methods injected into a class from a
different classloader).

On Wed, Jul 10, 2013 at 4:40 AM, Noctarius  wrote:
> Maybe a solution could be an annotation to mark calls to not
> appear in any stacktrace?

Personally, I'd love to see *any* way to teach JVM about
language-specific stack traces. Currently JRuby post-processes
exception traces to mine out compiled Ruby lines and transform
interpreter frames into proper file:line pairs. A way to say "at this
point, call back my code to build a StackTraceElement" would be very
useful across languages.

Of course, omitting from stack trace has very little to do with
stack-walking frame inspection tricks like CallerSensitive.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: sun.reflect.Reflection.getCallerClass(int) is going to be removed... how to replace?

2013-07-08 Thread Charles Oliver Nutter
We advise our users to pass in a classloader. Class.forName's
stack-based discovery of classloaders is too magic anyway.

In general, when there's magic happening at the JVM level that is not
possible for us to duplicate in JRuby, we warn our users away from
depending on it.

- Charlie

On Mon, Jul 8, 2013 at 3:33 AM, Jochen Theodorou  wrote:
> Hi all,
>
> 5 days nothing... Does that mean it is like that, there is no way around
> and I have to explain my users, that Java7/8 is going to break some
> "minor" functionality?
>
> bye blackdrag
>
> --
> Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
> blog: http://blackdragsview.blogspot.com/
> german groovy discussion newsgroup: de.comp.lang.misc
> For Groovy programming sources visit http://groovy-lang.org
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: jsr292-mock in Maven; coro-mock, unsafe-mock available

2013-07-07 Thread Charles Oliver Nutter
On Sun, Jul 7, 2013 at 3:16 PM, Remi Forax  wrote:
> Given that there is no need to bundle the backport with the jsr292-mock,
> I propose you something,
> you create a project on github under your name, you give me the right to
> push the code
> (I will create a textual representation of the API so you will be able
> to re-create the jar without
> having the right rt.jar available) and after you are free to do what you
> want with it :)

Ok, that works!

I have set up https://github.com/headius/jsr292-mock (you have access)
with a pom.xml ready to deploy and a basic README. You can throw
whatever you want in there and I'll structure it as appropriate for
maven and get an artifact pushed.

It would also be fine if you want to put the full generation pipeline
in there; I can have the project require Java 8, produce Java 6 (or
lower) sources, and just build/push with 8.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


jsr292-mock in Maven; coro-mock, unsafe-mock available

2013-07-07 Thread Charles Oliver Nutter
jsr292-mock:

Ok Rémi, it's decision time :-)

We *need* to get jsr292-mock into maven somehow for JRuby's new build,
so we don't have to version the binary anymore. We'd be happy to help
set up the maven pom.xml AND get a groupId set up via sonatype's maven
service, or we could just start pushing the artifact under a groupId
we own (com.headius or org.jruby). Ideally we'd agree between all
users where to put it and handle (as a team) getting artifacts pushed.

I'm right in thinking this does not change often, right? Unless
there's visible API changes in 8, the same artifact will probably get
pushed once and not change for a long time.

It's up to you (Rémi) and other users whether we should also push the
jsr292-backport to maven. We're not using it in JRuby right now.

unsafe-mock and coro-mock:

I have pushed two new artifacts to maven: com.headius.unsafe-mock and
com.headius.coro-mock.

unsafe-mock is basically just JDK8's Unsafe.java in artifact form. You
would set up your build to fetch it, stick it into bootclasspath, and
compile. The intent is to provide a full Unsafe API for compilation
only; you must detect in your own code whether certain methods are
actually available at runtime. We created this artifact because we use
the new JDK8 "fences" API (when available) for our Ruby instance
variable tables, but did not want to require JDK8 to build JRuby.

coro-mock is a mock of the latest coroutine API from Lukas, provided
in artifact form for the same reason. Since the API does not exist in
any release JDK, just adding to classpath/dependencies will allow
compiling against it. We use it for the Fiber (microthread) library in
JRuby (though I'd bet coro does not work anymore...still need to get a
JSR going for that).

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: idea: MethodHandle#invokeTailCall

2013-05-11 Thread Charles Oliver Nutter
On Fri, May 10, 2013 at 7:16 PM, Per Bothner  wrote:

> Fail hard is probably the wrong thing to do - except when debugging.
> I think what you want is the default to not fail if it can't pop
> the stack frame, but that there be a VM option to throw an Error
> or even do a VM abort in those case.  You'd run the test suite
> in this mode.
>
> That assumes that there is well-specified a minimal set of
> circumstances in which the inlining is done correctly,
> so a compiler or programmer can count on that, and that this
> set is sufficient for low-overhead tail-call elimination.
>

Making such guarantees would have to be explicit in the JVM spec, and then
we're sorta back to requiring a hard tail call guarantee (a hard inlining
guarantee to ensure tail calling happens is just a horse of a different
color).

There is actually a way to force inlining with the newer invokedynamic
impl: a @ForceInline (I forget the actual name) annotation that the
LambdaForm stuff uses internally. Now, if that were exposed as a standard
JVM feature, we could make such a hard guarantee...and then we're back to
having to tag calls or callees with annotations, which was something you
wanted to avoid (why, exactly?).


> I'll be happy when I can run Kawa with --full-tailcalls
> as the default with at most a minor performance degradation.
> If we don't get there, I'll be satisfied if at least it is
> faster (and simpler!) than the current trampoline-based
> implementation.


There's still a tail call patch in the MLVM repo, rotting on the vine. :-)

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Improving the speed of Thread interrupt checking

2013-05-11 Thread Charles Oliver Nutter
On Sat, May 11, 2013 at 3:37 AM, Alexander Turner wrote:

> Would not atomic increment and atomic decrement solve the multi-interrupt
> issue you suggest here? Such an approach is a little more costly because in
> the case of very high contention the setters need to spin to get the
> increment/decrement required if using pure CAS. That could be a lot of
> cache flushes - but it would then be strictly correct (I don't actually
> know how gcc or any other compiler goes about implementing add/sub):
>
> __sync_fetch_and_sub
> __sync_fetch_and_add
>

Yes, we could guarantee that all interrupts get seen and cleared
independently if we used an interrupt counter...but it's clear that's not
provided for by the contract of current Thread#interrupt logic, regardless
of how atomic you try to make it.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Improving the speed of Thread interrupt checking

2013-05-11 Thread Charles Oliver Nutter
An addendum:

thread.interrupt *does* have other side effects, like breaking out of
blocking IO operations. However, it still doesn't matter; you can mutex to
try to guarantee that the IO interrupt and setting the bit happen
atomically, but by being in blocking IO you already know the thread is not
running. A different sequence:

* Thread A performs a blocking IO operation and gets stuck.
* Thread B attempts to interrupt it
* Thread B acquires the lock, sets interrupt bit, and wakes A out of IO
* Thread A wakes up and interrupt bit is set. It is retrieved, cleared, and
handled as true

But...

* Thread A performs a blocking IO operation and gets stuck.
* Thread B attempts to interrupt it
* Thread A wakes out of blocking IO, sees interrupt bit is not set, and
proceeds
* Thread B acquires the lock
* Thread A ultimately handles the interrupt as false

Now, depending on when B decides to proceed with actually interrupting
blocking IO, A may have already stopped blocking and B might not see
that...unless all interruptible blocking IO operations *also* acquire the
interrupt mutex. But that doesn't work either; B can't acquire the
interrupt mutex if A is holding it and blocking. If A releases the mutex
upon blocking and tries to acquire it immediately after, we're back to
square one...B may not see that A has completed blocking because A can't
return from a blocking operation and acquire the mutex atomically, and B
can't acquire the mutex and check blocking status atomically.

It seems like you can't make any guarantees here either, even with locks.

- Charlie


On Sat, May 11, 2013 at 3:26 AM, Charles Oliver Nutter
wrote:

> On Sat, May 11, 2013 at 2:49 AM, Jeroen Frijters wrote:
>
>> I believe Thread.interrupted() and Thread.isInterrupted() can both be
>> implemented without a lock or CAS.
>>
>> Here are correct implementations:
>>
> ...
>
>> Any interrupts that happen before we clear the flag are duplicates that
>> we can ignore and any that happen after are new ones that will be returned
>> by a subsequent call. The key insight is that the interruptPending flag can
>> be set by any thread, but it can only be cleared by the thread it applies
>> to.
>>
>
> This may indeed be the case. My goal with considering CAS was to maintain
> the full behavioral constraints of the existing implementation, which will
> never clear multiple interrupts at once, regardless of duplication.
>
> If your assumption holds, then Vitaly's case is not a concern. His case,
> again:
>
> * Thread A retrieves interrupt status
> * Thread B sets interrupt, but cannot clear it from outside of thread A
> * Thread A clears interrupt
>
> The end result of this sequence is indeed different if A's get + clear are
> not atomic: the interrupt status after A returns would be clear rather than
> set. However, *it does not really matter*.
>
> If we look at the *caller* of the interrupt checking, things become
> obvious.
>
> Mutexed/atomic version:
>
> * Thread A makes a call to Thread.interrupt to get and clear interrupt
> status
> * Thread A acquires lock and gets interrupt status and clears it atomically
> * Thread A returns from Thread.interrupt, reporting that the thread was
> interrupted, and the caller knows it has been cleared
> * Before Thread A proceeds any further (raising an error, etc), thread B
> comes in and sets interrupt status.
>
> The result is that the interrupt is set, and there's nothing A can do to
> ensure it has been cleared. A subsequent call to Thread.interrupted can be
> preempted *after* the clear anyway.
>
> So, a different preemption order with mutex:
>
> * Thread A makes a call to Thread.interrupt to get and clear interrupt
> status
> * Before the mutex is acquired, Thread B swoops in, setting interrupt
> status.
> * Thread A proceeds to acquire mutex and only sees a single interrupt bit;
> it gets status and clears it.
>
> So even an atomic version does nothing to guarantee what the interrupt
> status will be after all threads are finished fiddling with the interrupt
> bit; preemption can happen before or after the mutexed operation, producing
> different results in both cases.
>
> Ultimately, this may actually be a flaw with the way Thread interrupt
> works in the JVM. If there's potential for interrupt to be set twice or
> more, the interrupted thread can't ever guarantee that the interrupt has
> been cleared.
>
> In practice, this flaw may not matter; if you have one or more external
> threads that interrupt a target thread N times, you have to assume (and
> have always had to assume) the target thread will see anywhere from 1 to N
> of those interrupts, depending on preemption. This does not change with any
> of

Re: Improving the speed of Thread interrupt checking

2013-05-11 Thread Charles Oliver Nutter
On Sat, May 11, 2013 at 2:49 AM, Jeroen Frijters  wrote:

> I believe Thread.interrupted() and Thread.isInterrupted() can both be
> implemented without a lock or CAS.
>
> Here are correct implementations:
>
...

> Any interrupts that happen before we clear the flag are duplicates that we
> can ignore and any that happen after are new ones that will be returned by
> a subsequent call. The key insight is that the interruptPending flag can be
> set by any thread, but it can only be cleared by the thread it applies to.
>

This may indeed be the case. My goal with considering CAS was to maintain
the full behavioral constraints of the existing implementation, which will
never clear multiple interrupts at once, regardless of duplication.

If your assumption holds, then Vitaly's case is not a concern. His case,
again:

* Thread A retrieves interrupt status
* Thread B sets interrupt, but cannot clear it from outside of thread A
* Thread A clears interrupt

The end result of this sequence is indeed different if A's get + clear are
not atomic: the interrupt status after A returns would be clear rather than
set. However, *it does not really matter*.

If we look at the *caller* of the interrupt checking, things become obvious.

Mutexed/atomic version:

* Thread A makes a call to Thread.interrupt to get and clear interrupt
status
* Thread A acquires lock and gets interrupt status and clears it atomically
* Thread A returns from Thread.interrupt, reporting that the thread was
interrupted, and the caller knows it has been cleared
* Before Thread A proceeds any further (raising an error, etc), thread B
comes in and sets interrupt status.

The result is that the interrupt is set, and there's nothing A can do to
ensure it has been cleared. A subsequent call to Thread.interrupted can be
preempted *after* the clear anyway.

So, a different preemption order with mutex:

* Thread A makes a call to Thread.interrupt to get and clear interrupt
status
* Before the mutex is acquired, Thread B swoops in, setting interrupt
status.
* Thread A proceeds to acquire mutex and only sees a single interrupt bit;
it gets status and clears it.

So even an atomic version does nothing to guarantee what the interrupt
status will be after all threads are finished fiddling with the interrupt
bit; preemption can happen before or after the mutexed operation, producing
different results in both cases.

Ultimately, this may actually be a flaw with the way Thread interrupt works
in the JVM. If there's potential for interrupt to be set twice or more, the
interrupted thread can't ever guarantee that the interrupt has been cleared.

In practice, this flaw may not matter; if you have one or more external
threads that interrupt a target thread N times, you have to assume (and
have always had to assume) the target thread will see anywhere from 1 to N
of those interrupts, depending on preemption. This does not change with any
of the proposed implementations. The only safe situation is when you know
interruption will happen only once within a critical section of code.

Put simply (tl;dr): even with atomic/mutexed interrupt set+clear, you can't
make any guarantees about how many interrupts will be seen if multiple
interrupts are attempted. If true, the mutex in the current implementation
is 100% useless.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Improving the speed of Thread interrupt checking

2013-05-11 Thread Charles Oliver Nutter
On Sat, May 11, 2013 at 1:46 AM, Alexander Turner wrote:

> Thanks for the explanation. I have recently (for the last 6 months) been
> involved with some very performance centric multi-threaded work in
> profiling the JVM. Using JVMTI as a profiling tool with C++ underneath. The
> code all uses JVM locks where locks are required - but as profilers need to
> be as invisible as possible I have been removing locks where they can be
> avoided.
>
> My experience here has indicated that on modern machies CAS operations are
> always worth a try compared to locks. The cost of loosing the current
> quantum (even on *NIX) is so high that it is not worth paying unless a
> thread is truly blocked - e.g. for IO.
>
...


> In your case, inter-thread signalling is definitely not work loosing a
> quantum over.
>
> If I get chance over the next couple of days I'll make great a cut down
> example of CAS over thread.interup and run the profiler (DevpartnerJ) over
> it - it could be a great unit test.
>

Yes, it could be illustrative. Finding this code in Hotspot also makes me
wonder what other VM-level state is "excessively guarded" by using locking
constructs instead of lock-free operations like CAS.

The code involved is also not particularly complex. I may see if I can hack
in a CAS version of the interrupt check+clear logic and see how things look
as a result. The next step would be moving that CAS directly into the
intrinsic, so it can optimize along with code calling it.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Improving the speed of Thread interrupt checking

2013-05-10 Thread Charles Oliver Nutter
SwitchPoint is indeed an option, and I have used it in JRuby's compiler to
reduce the frequency of checking for interrupt events.

However in this case it is just a plain old library written in plain old
Java that supports Java 6+. Using Indy stuff isn't really an option.

Plus...it doesn't solve the performance issue of Thread.interrupted anyway
:-)

- Charlie (mobile)
On May 10, 2013 5:48 PM, "Remi Forax"  wrote:

> On 05/10/2013 06:03 PM, Charles Oliver Nutter wrote:
> > This isn't strictly language-related, but I thought I'd post here
> > before I start pinging hotspot folks directly...
> >
> > We are looking at adding interrupt checking to our regex engine, Joni,
> > so that long-running (or never-terminating) expressions could be
> > terminated early. To do this we're using Thread.interrupt.
> >
> > Unfortunately our first experiments with it have shown that interrupt
> > checking is rather expensive; having it in the main instruction loop
> > slowed down a 16s benchmark to 68s. We're reducing that checking by
> > only doing it every N instructions now, but I figured I'd look into
> > why it's so slow.
> >
> > Thread.isInterrupted does currentThread().interrupted(), both of which
> > are native calls. They end up as intrinsics and/or calling
> > JVM_CurrentThread and JVM_IsInterrupted. The former is not a
> > problem...accesses threadObj off the current thread (presumably from
> > env) and twiddles handle lifetime a bit. The latter, however, has to
> > acquire a lock to ensure retrieval and clearing are atomic.
> >
> > So then it occurred to me...why does it have to acquire a lock at all?
> > It seems like a get + CAS to clear would prevent accidentally clearing
> > another thread's re-interrupt. Some combination of CAS operations
> > could avoid the case where two threads both check interrupt status at
> > the same time.
> >
> > I would expect the CAS version would have lower overhead than the hard
> > mutex acquisition.
> >
> > Does this seem reasonable?
> >
> > - Charlie
>
> Hi Charles,
> if a long-running expression is an exception, I think it's better to use
> a SwitchPoint
> (or a MutableCallsite stored in a static final field for that).
> Each regex being parsed first register itself in a queue, a thread wait
> on the first item of the queue,
> it the time is elapsed, the SwitchPoint is switch-off, so each thread
> Joni parsers knows that something goes
> wrong and check the first item. The parser which timeout, remove itself
> from the queue and create a new SwitchPoint.
> So the check is done only when a parser run too long.
>
> Rémi
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Improving the speed of Thread interrupt checking

2013-05-10 Thread Charles Oliver Nutter
For your ABA case, I can think of a couple options:

* instead of get, do getAndSet when clearing. Whether it is true or false,
it will end up false, so clearing is not a big deal. However, we're always
doing the write then, so perhaps...
* CAS(true, false) instead of just reading. If set, it will be cleared. If
unset, CAS will fail and we know it was not set. Again, not sure about the
cost of this versus the simple read. It should usually fail, and I don't
know that cost either.

I am not sure at what point the lock becomes the cheaper option, but it
seems like it would still be more expensive than either of these.

And the clearing case is actually the common one; most users call
Thread.interrupted, which gets and clears all at once. Even if you use the
non-clearing Thread#isInterrupted, you probably still need to clear it
after you respond to the interruption...in our case, raising an appropriate
error to indicate the regex did not return in a reasonable amount of time.
We don't want interrupt flag to linger after that error is handled.

- Charlie (mobile)
On May 10, 2013 6:51 PM, "Vitaly Davidovich"  wrote:

> How would you handle the following with just CAS:
> 1) thread A reads the status and notices that it's set, and then gets
> preemepted
> 2) thread B resets the interrupt and then sets it again
> 3) thread A resumes and does a CAS expecting the current state to be
> interrupted, which it is - CAS succeeds and resets interrupt
>
> The problem is that it just reset someone else's interrupt and not the one
> it thought it was resetting - classic ABA problem.
>
> You'd probably need some ticketing/versioning built in there to detect
> this; perhaps use a uint with 1 bit indicating status and the rest is
> version number - then can do CAS against that encoded value.
>
> However, I'm not sure if this case (checking interrupt and clearing) is
> all that common - typically you just check interruption only - and so
> unclear if this is worthwhile.
>
> Sent from my phone
> On May 10, 2013 12:05 PM, "Charles Oliver Nutter" 
> wrote:
>
>> This isn't strictly language-related, but I thought I'd post here before
>> I start pinging hotspot folks directly...
>>
>> We are looking at adding interrupt checking to our regex engine, Joni, so
>> that long-running (or never-terminating) expressions could be terminated
>> early. To do this we're using Thread.interrupt.
>>
>> Unfortunately our first experiments with it have shown that interrupt
>> checking is rather expensive; having it in the main instruction loop slowed
>> down a 16s benchmark to 68s. We're reducing that checking by only doing it
>> every N instructions now, but I figured I'd look into why it's so slow.
>>
>> Thread.isInterrupted does currentThread().interrupted(), both of which
>> are native calls. They end up as intrinsics and/or calling
>> JVM_CurrentThread and JVM_IsInterrupted. The former is not a
>> problem...accesses threadObj off the current thread (presumably from env)
>> and twiddles handle lifetime a bit. The latter, however, has to acquire a
>> lock to ensure retrieval and clearing are atomic.
>>
>> So then it occurred to me...why does it have to acquire a lock at all? It
>> seems like a get + CAS to clear would prevent accidentally clearing another
>> thread's re-interrupt. Some combination of CAS operations could avoid the
>> case where two threads both check interrupt status at the same time.
>>
>> I would expect the CAS version would have lower overhead than the hard
>> mutex acquisition.
>>
>> Does this seem reasonable?
>>
>> - Charlie
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Improving the speed of Thread interrupt checking

2013-05-10 Thread Charles Oliver Nutter
You need CAS because one form of the interrupt check clears it and another
does not. So the get + check + set of interrupt status needs to be atomic,
or another thread could jump in and change it during that process.

If it were just being read, then sure...it could simply be volatile. But
since there's a non-atomic operation in there, a race might be possible.

I just took a deeper look at the intrinsic, to see if it avoids the
lock...but unfortunately it does not. It adds fast paths for when the
thread is not interrupted *and* clearing is not requested (Thread.interrupt
clears, Thread#isInterrupted does not). So the typical use case of calling
Thread.interrupt() to get and clear interrupt status still follows the
slow, locking path all the time.

We are mitigating this in our code by using Thread#isInterrupted()
(th.isInterrupted on a Thread object) to do the frequent checks, and then
using Thread.interrupted to clear it only when it has been set. I think
this will be ok, but the slow path still seems like it could benefit from a
CAS impl instead of a lock.

- Charlie


On Fri, May 10, 2013 at 11:17 AM, Alexander Turner
wrote:

> Charles,
>
> Why bother even using CAS?
>
> Thread A is monitoring Thread B. Thread B cooperatively checks to see if
> it should die.
>
> Therefore, you only need B to know when A has told it to shut down.
>
> Therefore, all you need is a volatile boolean. A volatile boolean is very
> much faster than a full CAS operation.
> http://nerds-central.blogspot.co.uk/2011/11/atomicinteger-volatile-synchronized-and.html
>
> Best wishes - AJ
>
>
> On 10 May 2013 17:03, Charles Oliver Nutter  wrote:
>
>> This isn't strictly language-related, but I thought I'd post here before
>> I start pinging hotspot folks directly...
>>
>> We are looking at adding interrupt checking to our regex engine, Joni, so
>> that long-running (or never-terminating) expressions could be terminated
>> early. To do this we're using Thread.interrupt.
>>
>> Unfortunately our first experiments with it have shown that interrupt
>> checking is rather expensive; having it in the main instruction loop slowed
>> down a 16s benchmark to 68s. We're reducing that checking by only doing it
>> every N instructions now, but I figured I'd look into why it's so slow.
>>
>> Thread.isInterrupted does currentThread().interrupted(), both of which
>> are native calls. They end up as intrinsics and/or calling
>> JVM_CurrentThread and JVM_IsInterrupted. The former is not a
>> problem...accesses threadObj off the current thread (presumably from env)
>> and twiddles handle lifetime a bit. The latter, however, has to acquire a
>> lock to ensure retrieval and clearing are atomic.
>>
>> So then it occurred to me...why does it have to acquire a lock at all? It
>> seems like a get + CAS to clear would prevent accidentally clearing another
>> thread's re-interrupt. Some combination of CAS operations could avoid the
>> case where two threads both check interrupt status at the same time.
>>
>> I would expect the CAS version would have lower overhead than the hard
>> mutex acquisition.
>>
>> Does this seem reasonable?
>>
>> - Charlie
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: idea: MethodHandle#invokeTailCall

2013-05-10 Thread Charles Oliver Nutter
Interesting idea...comments below.

On Fri, May 10, 2013 at 12:44 PM, Per Bothner  wrote:

> So this idea come to me: Could we just have add a method
> that tail-calls a MethodHandle?  Maybe some variant of
>MethodHandle#invokeAsTailCall(Object... )
> This doesn't require instruction-set or classfile changes,
> "only" a new intrinsic method.  Of course it's a bit more
> complex than that: The actual tailcall to be useful has
> to be done in the method that does the invokeAsTailCall,
> not the invokeAsTailCall itself.  I.e. the implementation
> of invokeAsTailCall has to pop only it own (native) stack
> frame, but also the caller.
>

Seems feasible to me. Ideally in any case where Hotspot can inline a method
handle call (generally only if it's static final (?) or in constant pool
some other way) it should also be able to see that this is a tail
invocation of a method call.

However...there are cases where Hotspot *can't* inline the handle
(dynamically prepared, etc) in which cases you'd want invokeAsTailCall to
fail hard, right? Or if it didn't fail...you're not fulfilling the promise
of a tail call, and we're back to the debates about whether JVM should
support "hard" or "soft" tail calling guarantees.


> One problem with invokeAsTailCall is it implies
> needless boxing, which may be hard to optimize away.
> Perhaps a better approach would be to use invokedynamic
> in some special conventional way, like with a magic
> CallSite.  However, that makes calling from Java more
> difficult.
>

Making it signature-polymorphic, like invokeExact and friends, would avoid
the boxing (when inlined).

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Improving the speed of Thread interrupt checking

2013-05-10 Thread Charles Oliver Nutter
This isn't strictly language-related, but I thought I'd post here before I
start pinging hotspot folks directly...

We are looking at adding interrupt checking to our regex engine, Joni, so
that long-running (or never-terminating) expressions could be terminated
early. To do this we're using Thread.interrupt.

Unfortunately our first experiments with it have shown that interrupt
checking is rather expensive; having it in the main instruction loop slowed
down a 16s benchmark to 68s. We're reducing that checking by only doing it
every N instructions now, but I figured I'd look into why it's so slow.

Thread.isInterrupted does currentThread().interrupted(), both of which are
native calls. They end up as intrinsics and/or calling JVM_CurrentThread
and JVM_IsInterrupted. The former is not a problem...accesses threadObj off
the current thread (presumably from env) and twiddles handle lifetime a
bit. The latter, however, has to acquire a lock to ensure retrieval and
clearing are atomic.

So then it occurred to me...why does it have to acquire a lock at all? It
seems like a get + CAS to clear would prevent accidentally clearing another
thread's re-interrupt. Some combination of CAS operations could avoid the
case where two threads both check interrupt status at the same time.

I would expect the CAS version would have lower overhead than the hard
mutex acquisition.

Does this seem reasonable?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: JVM Summit Wrokshop/talk request

2013-04-12 Thread Charles Oliver Nutter
I think we can safely say there's a lot of interest in a European
JVMLS-like event. I'll make every effort to be there if one is
hosted...but I'll leave it up to you European folks to figure out
where and when :-)

Just give me plenty of advance notice ;-)

- Charlie

On Fri, Apr 12, 2013 at 5:28 AM, Ben Evans
 wrote:
> +1
>
> Stockholm's nice. Or there's always London...
>
>
> On Fri, Apr 12, 2013 at 6:36 AM, Marcus Lagergren
>  wrote:
>>
>> +1 to that.  We could probably host something in Stockholm too, if there
>> is interest. (We are the third largest Oracle JVM engineering site in the
>> world after Santa Clara and Burlington, MA).
>>
>> /M
>>
>> On Apr 11, 2013, at 10:09 PM, Charles Oliver Nutter 
>> wrote:
>>
>> > I would absolutely love to have a European edition of the JVM Language
>> > Summit. It's not a very complicated event to put together, and it
>> > would give us an opportunity to meet with more language folks than can
>> > make it to California for JVMLS.
>> >
>> > You could count on my attendance.
>> >
>> > - Charlie
>> >
>> > On Thu, Apr 11, 2013 at 5:30 AM, MacGregor, Duncan (GE Energy
>> > Management)  wrote:
>> >> I would certainly be interested, though travel budgets do seem to be
>> >> tight
>> >> this year.
>> >>
>> >> We could probably host it here in Cambridge if you guys want to come
>> >> over
>> >> to the UK.
>> >>
>> >> On 09/04/2013 08:19, "Julien Ponge"  wrote:
>> >>
>> >>> Just an idea: would some of you be interested in having a meeting at
>> >>> some
>> >>> point in Europe?
>> >>>
>> >>> I (or Rémi) can probably organise something at our Unis.
>> >>>
>> >>> - Julien
>> >>>
>> >>> On Apr 9, 2013, at 4:55 AM, Mark Roos  wrote:
>> >>>
>> >>>> Thanks for the interest.
>> >>>>
>> >>>> I added this workshop to my proposal.  Inputs are welcome on how to
>> >>>> make it a good workshop.
>> >>>>
>> >>>> mark
>> >>>>
>> >>>> Improving the performance of InvokeDynamic
>> >>>>
>> >>>> Now that we have some experience with InvokeDynamic its time
>> >>>> to discuss strategies and efforts for performance improvement.
>> >>>> We expect to have experts, HotSpot implementers and users
>> >>>> discussing how to get the best performance
>> >>>> possible.___
>> >>>> mlvm-dev mailing list
>> >>>> mlvm-dev@openjdk.java.net
>> >>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>> >>>
>> >>> ___
>> >>> mlvm-dev mailing list
>> >>> mlvm-dev@openjdk.java.net
>> >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>> >>
>> >> ___
>> >> mlvm-dev mailing list
>> >> mlvm-dev@openjdk.java.net
>> >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>> > ___
>> > mlvm-dev mailing list
>> > mlvm-dev@openjdk.java.net
>> > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: JVM Summit Wrokshop/talk request

2013-04-11 Thread Charles Oliver Nutter
I would absolutely love to have a European edition of the JVM Language
Summit. It's not a very complicated event to put together, and it
would give us an opportunity to meet with more language folks than can
make it to California for JVMLS.

You could count on my attendance.

- Charlie

On Thu, Apr 11, 2013 at 5:30 AM, MacGregor, Duncan (GE Energy
Management)  wrote:
> I would certainly be interested, though travel budgets do seem to be tight
> this year.
>
> We could probably host it here in Cambridge if you guys want to come over
> to the UK.
>
> On 09/04/2013 08:19, "Julien Ponge"  wrote:
>
>>Just an idea: would some of you be interested in having a meeting at some
>>point in Europe?
>>
>>I (or Rémi) can probably organise something at our Unis.
>>
>>- Julien
>>
>>On Apr 9, 2013, at 4:55 AM, Mark Roos  wrote:
>>
>>> Thanks for the interest.
>>>
>>> I added this workshop to my proposal.  Inputs are welcome on how to
>>> make it a good workshop.
>>>
>>> mark
>>>
>>> Improving the performance of InvokeDynamic
>>>
>>> Now that we have some experience with InvokeDynamic its time
>>> to discuss strategies and efforts for performance improvement.
>>> We expect to have experts, HotSpot implementers and users
>>> discussing how to get the best performance
>>>possible.___
>>> mlvm-dev mailing list
>>> mlvm-dev@openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>>___
>>mlvm-dev mailing list
>>mlvm-dev@openjdk.java.net
>>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [jvm-l] Improving the performance of stacktrace generation

2013-04-11 Thread Charles Oliver Nutter
I talked a bit with John Rose about this, and he agreed with me that a
good partial measure might be to add APIs for getting a *partial*
stack.

Currently, Hotspot will limit how deep a stack trace it generates.
This can have a very large impact on the performance of generating
traces.

The magic flag is -XX:MaxJavaStackTraceDepth=, and the default on
my system is 1024. Here's a set of benchmarks of various trace depths
from 1000 down to 2. Once you get down to 100 frames, performance of
generating a stack trace starts to improve considerably.

https://gist.github.com/headius/5365217

Unfortunately there's no API to get just a partial stack trace, via
JVMTI or otherwise. The relevant code in Hotspot itself is rather
simple; I started prototyping a JNI call that would allow getting a
partial trace. Perhaps something like:

thread.getStackTrace(depth)

...and something equivalent for JVMTI.

John agreed that this would be a worthwhile feature for a JEP, and I'd
certainly like to see it trickle into a standard API too.

- Charlie

On Thu, Apr 11, 2013 at 3:37 AM,   wrote:
> Hi Bob,
>
> I wrote an article last year on the cost and impact of JVMTI stack collection.
>
> http://www.jinspired.com/site/is-jvm-call-stack-sampling-suitable-for-monitoring-low-latency-trading-apps
>
> I would prefer to see the JVM come up with a standard API and mechanism to 
> allow the stack to be augmented with additional frames that not only include 
> Java code but more contextual information related to executing activity 
> (code, block, flow,) this would include other JVM languages.
>
> We provide this sort of thing already today for Java, JRuby/Ruby and 
> Jython/Python, even SQL, in our metering engine but would welcome an ability 
> to replicate this data to the VM itself so standard tools need not be 
> changed. What is cool about this is that we can simulate a stack in a remote 
> JVM that spans multiple real application runtimes.
>
> http://www.jinspired.com/site/jxinsight-opencore-6-4-ea-12-released
>
> Kind regards,
>
> William
>
>>-Original Message-
>>From: Bob Foster [mailto:bobfos...@gmail.com]
>>Sent: Sunday, July 8, 2012 01:32 AM
>>To: jvm-langua...@googlegroups.com
>>Cc: 'Da Vinci Machine Project'
>>Subject: Re: [jvm-l] Improving the performance of stacktrace generation
>>
>>> Any thoughts on this? Does anyone else have need for
>>lighter-weight name/file/line inspection of the call stack?
>>
>>Well, yes. Profilers do.
>>
>>Recall Cliff Click bragging a couple of years ago at the JVM Language
>>Summit about how fast stack trace generation is in Azul Systems' OSs...and
>>knocking Hotspot for being so slow. It turns out that stack trace
>>generation is a very significant overhead in profiling Hotspot using JVMTI.
>>Even CPU sampling on 20 ms. intervals can add 3% or more to execution time,
>>almost entirely due to the delay in reaching a safe point (which also
>>guarantees the profile will be incorrect) and generating a stack trace for
>>each thread.
>>
>>But 3% is peanuts compared to the cost of memory profiling, which can
>>require a stack trace on every new instance creation. In a profiler I wrote
>>using JVMTI, I discovered that it was faster to call into JNI code on every
>>method entry and exit (and exception catch), keeping a stack trace
>>dynamically than to call into JNI only when memory was allocated and
>>request a stack trace each time. The "fast" technique is about 3-10 times
>>slower than running without profiling. The Netbeans profiler doesn't use
>>this optimization, and its memory profiler when capturing every allocation,
>>as I did, is 2-3 ORDERS OF MAGNITUDE slower than normal (non-server)
>>execution.
>>
>>Faster stack traces would benefit the entire Hotspot profiling community.
>>
>>Bob
>>
>>On Sat, Jul 7, 2012 at 3:03 PM, Charles Oliver Nutter
>>wrote:
>>
>>> Today I have a new conundrum for you all: I need stack trace
>>> generation on Hotspot to be considerably faster than it is now.
>>>
>>> In order to simulate many Ruby features, JRuby (over)uses Java stack
>>> traces. We recently (JRuby 1.6, about a year ago) moved to using the
>>> Java stack trace as the source of our Ruby backtrace information,
>>> mining out compiled frames and using interpreter markers to peel off
>>> interpreter frames. The result is that a Ruby trace with mixed
>>> compiled and interpreted code like this
>>> (https://gist.github.com/3068210) turns into this
>>> (https://gist.github.com/3068213). I consider this a great deal better
>>> than the plain J

Re: JVM Summit Wrokshop/talk request

2013-04-08 Thread Charles Oliver Nutter
I will volunteer to be an expert.

On Mon, Apr 8, 2013 at 2:53 PM, Mark Roos  wrote:
> I would love to put it together, but my knowledge is minimal.  I don't mind
> the
> organizing part but I think we need some folks from the jvm side to be the
> main
> speaker/know it all(s).
>
> So if we can get some volunteers to be the experts I will gladly propose and
> mc
> a workshop/panel
>
> mark
>
>
>
>
> From:Charles Oliver Nutter 
> To:Da Vinci Machine Project 
> Date:04/08/2013 12:26 PM
> Subject:Re: JVM Summit Wrokshop/talk request
> Sent by:mlvm-dev-boun...@openjdk.java.net
> 
>
>
>
> Indeed...I think we need to get all us invokedynamicists into the same
> room to better understand what's working, what's not, and where to go
> from here. Consider me in.
>
> I'm sure it would be accepted, so a proposal would probably be a
> formality...but do you want to throw something together, Mark?
>
> - Charlie
>
> On Mon, Apr 8, 2013 at 2:03 PM, Mark Roos  wrote:
>> It seems like quite a bit of work is going on around improving the
>> performance of invokeDynamic.
>> It would be interesting ( at least to me ) to have an in depth discussion
>> of
>> what is being done and
>> how I should adjust my usage to get the best performance for a dynamic
>> language.
>>
>> I'll buy the drinks
>>
>> mark
>>
>>
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: JVM Summit Wrokshop/talk request

2013-04-08 Thread Charles Oliver Nutter
Indeed...I think we need to get all us invokedynamicists into the same
room to better understand what's working, what's not, and where to go
from here. Consider me in.

I'm sure it would be accepted, so a proposal would probably be a
formality...but do you want to throw something together, Mark?

- Charlie

On Mon, Apr 8, 2013 at 2:03 PM, Mark Roos  wrote:
> It seems like quite a bit of work is going on around improving the
> performance of invokeDynamic.
> It would be interesting ( at least to me ) to have an in depth discussion of
> what is being done and
> how I should adjust my usage to get the best performance for a dynamic
> language.
>
> I'll buy the drinks
>
> mark
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Looking for comments on paper draft "DynaMate: Simplified and optimized invokedynamic dispatch"

2013-04-04 Thread Charles Oliver Nutter
If it's not too late...I'd like to see the paper too :-)

And I also wonder whether we should start consolidating approaches a
bit. InvokeBinder has become very feature-rich, now providing the
ability to track arguments by name through the MH chain. I'm hoping to
fill it out more and do a new release soon, but I'm using it for just
about all my MH wrangling.

- Charlie

On Tue, Feb 19, 2013 at 7:37 AM, Eric Bodden  wrote:
> Hi all.
>
> Kamil Erhard, a student of mine, and myself have prepared a paper
> draft on a novel framework for invokedynamic dispatch that we call
> DynaMate. The framework is meant to aid language developers in using
> java.lang.invoke more easily by automatically taking care of common
> concerns like guarding and caching of method handles or adapting
> arguments between callers and callees.
>
> By March 28th, we plan to submit the draft to OOPSLA, at which point
> we will probably also make the publication available as a Technical
> Report, and will also open-source the implementation. Right now, I
> would like to use this email to reach out to experts in the community
> to get some feedback on this work, both in terms of what could be
> improved w.r.t. the paper and in terms of the DynaMate framework
> itself.
>
> So please let me know if you are interested in obtaining a copy of the
> draft to then provide us with feedback. In this case I would email you
> the PDF some time this week.
>
> Best wishes,
> Eric
>
> P.S. Here is the current abstract:
>
> Version 7 of the Java runtime includes a novel invokedynamic bytecode
> and API, which allow the implementers of programming languages
> targeting the Java Virtual Machine to customize the dispatch semantics
> at every invokedynamic call site. This mechanism is quite powerful and
> eases the implementation of dynamic languages, but is is also hard to
> handle, as it allows for many degrees of freedom and much room for
> error. While implementers of some dynamic languages have successfully
> switched to using invokedynamic, others are struggling with the steep
> learning curve.
> We present DYNAMATE, a novel framework allowing dynamic-language
> implementers to define dispatch patterns more easily. Implementations
> using DYNAMATE achieve reduced complexity, improved maintainability,
> and optimized performance. Moreover, future improvements to DYNAMATE
> can benefit all its clients.
> As we show, it is easy to modify the implementations of Groovy, JCop,
> JRuby, Jython to base their dynamic dispatch on DYNAMATE. A set of
> representative benchmarks shows that DYNAMATE-enabled dispatch code
> usually achieves equal or better performance compared to the code that
> those implementations shipped with originally. DYNAMATE is available
> as an open-source project.
>
> --
> Eric Bodden, Ph.D., http://sse.ec-spride.de/ http://bodden.de/
> Head of Secure Software Engineering Group at EC SPRIDE
> Tel: +49 6151 16-75422Fax: +49 6151 16-72051
> Room 3.2.14, Mornewegstr. 30, 64293 Darmstadt
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Perf regression since b72

2013-03-30 Thread Charles Oliver Nutter
I've been fiddling about with performance a bit again recently, and have
noticed a perf degradation since b72. I mentioned this to the Nashorn guys
and Marcus discovered that InlineSmallCode=2000 helped them get back to b72
performance. I can confirm this on JRuby as well, but in any case it seems
that something has regressed.

Here's some numbers with JRuby. Numbers are for b72, hotspot-comp, and
hotspot-comp with InlineSmallCode=2000. You can see that current
hotspot-comp builds do not perform as well as b72 unless that flag is
passed.

https://gist.github.com/headius/de7f99b52847c2436ee4

I have not yet started to explore the inlining or assembly output, but I
wanted to confirm that others are seeing this degradation.

My build of hotspot-comp is current.

I do have some benchmarks that look fine without the additional flag
(neural_net, for example), so I'm confused what's different in the degraded
cases.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: New Ruby impl based on PyPy...early perf numbers ahead of JRuby

2013-02-09 Thread Charles Oliver Nutter
On Sat, Feb 9, 2013 at 1:07 PM, Thomas Wuerthinger
 wrote:
> Do you also have startup performance metrics - I assume the numbers below
> are about peak performance?

It seems to warm up very quickly; there's sometimes 2x slower perf on
the first iteration, but it rapidly settles. Overall startup time is
considerably better than JRuby.

> What is the  approximate % of language feature completeness of Topaz and do
> you think this aspect is relevant when comparing performance?

Hard to say. They consulted me and other Ruby implementers to learn
the most difficult features to implement, and made sure they put those
in place. But I've had a lot of trouble with these benchmarks,
partially due to missing language features.

The specific language features we recommended they implement before
measuring perf are mostly related to closure state and cross-frame
variable access. In JRuby, such things require allocation on the heap,
and since closure-receiving methods don't specialize (or do
context-sensitive caller-callee profiling) EA can't ever get rid of
those structures. The allocation and value indirection kills us.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


New Ruby impl based on PyPy...early perf numbers ahead of JRuby

2013-02-09 Thread Charles Oliver Nutter
So, that new Ruby implementation I hinted at was announced this week.
It's called Topaz, and it's based on the RPython/PyPy toolchain.

It's still very early days, of course, since the vast majority of Ruby
core has not been implemented yet. But for the benchmarks it can run,
it usually beats JRuby + invokedynamic.

Some numbers...

Richards is 4-5x faster on Topaz than JRuby.

Red/black is a bit less than 2x faster on Topaz than the JRuby with
the old indy impl and a bit more than 2x faster than the JRuby with
the new impl.

Tak and fib are each about 10x faster on JRuby. Topaz's JIT is
probably not working right here, perhaps because the benchmarks are
deeply recursive.

Neural is a bit less than 2x faster on Topaz than on JRuby.

I had to do a lot of massaging to get these benchmarks to run due to
Topaz's very-incomplete core classes, but you can see where Topaz
could potentially give us a run for our money. In general, Topaz is
already faster than JRuby, and still implements most of the
"difficult" Ruby language features that usually hurt performance.

My current running theory for a lot of this performance is the fact
that the RPython/PyPy toolchain does a better job than Hotspot in two
areas:

* It is a tracing JIT, so I believe it's specializing code better. For
example, closures passed through a common piece of code appear to
still optimize as though they're monomorphic all the way. If we're
ever going to have closures (or lambdas) perform as well as they
should, closure-receiving methods need to be able to specialize.
* It does considerably better at escape detection than Hotspot's
current escape analysis. Topaz does *not* use tagged integers, and yet
numeric performance is easily 10x better than JRuby. This also plays
into closure performance.

Anyway, I thought I'd share these numbers, since they show we've got
more work to do to get JVM-based dynamic languages competitive with
purpose-built dynamic language VMs. I'm not really *worried* per se,
since raw language performance rarely translates into application
performance (app perf is much more heavily dependent on the
implementation of core classes, which are all Java code in JRuby and
close to irreducible, perf-wise), but I'd obviously like to see us
stay ahead of the game :-)

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Symbolic argument support in InvokeBinder

2013-02-02 Thread Charles Oliver Nutter
Sitting here at FOSDEM today I was showing Remi my new addition to
InvokeBinder: named arguments.

Background: InvokeBinder is my little Java DSL/fluent API for building
method handle chains. Short example:

MethodHandle mh = Binder
   .from(String.class, String.class, String.class) // String w(String, String)
   .drop(1, String.class) // String x(String)
   .insert(0, 'hello') // String y(String, String)
   .cast(String.class, CharSequence.class, Object.class) // String
z(CharSequence, Object)
   .invoke(someTargetHandle);

The new stuff I added is a Signature class for managing a MethodType
along with an array of argument names, and SmartBinder to take
advantage of that. How is this useful? The above example might be
reworked as follows:

Signature sig = Signature
.returning(String.class)
.appendArg("arg1", String.class)
.appendArg("arg2", String.class);

MethodHandle mh = SmartBinder
   .from(sig)
   .drop("arg2") // String x(String)
   .prepend("argX", 'hello') // String y(String, String)
   .cast(String.class, CharSequence.class, Object.class) // String
z(CharSequence, Object)
   .invoke(someTargetHandle);

So we can always use the argument names rather than error-prone
indices. This is especially useful for permutes, which I consistently
get completely wrong:

MethodHandle incoming = handle with signature below;

Signature sig = Signature
.returning(String.class)
.appendArg("arg1", String.class)
.appendArg("arg2", String.class)
.appendArg("arg3", String.class);

// permute without indices!
MethodHandle permuted = sig.permuteWith(incoming, "arg1", "arg3");

This is not in an InvokeBinder release yet because I want to add all
Binder operations to SmartBinder, but I'm looking for feedback and
other use cases for named arguments in the signature. Thanks!

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: hotspot-comp OS X builds

2013-01-25 Thread Charles Oliver Nutter
Kris answered about JDK8. As far as JDK7u, you can follow the 7u
mailing list. It looks like u12 has been renumbered to u14 (probably
to make room for any additional security releases that might be needed
in a u13), but you basically just want to track hs24-bXX and related
commit info.

Others on this list might be able to give you a more definitive
answer, but I have mostly been tracking Hotspot versions to know what
features are where.

FWIW, JRuby actually inspects Hotspot version to know whether to
default invokedynamic use to "on", since only hs24+ has fixed the
NCDFE issue.

- Charlie

On Fri, Jan 25, 2013 at 5:56 AM, MacGregor, Duncan (GE Energy
Management)  wrote:
> Can I just check whether all this stuff has made it into the 7u12 or 8
> snapshot releases, and if not when it will?
>
> Alternatively I can do a Windows build myself from source if its all made
> it into the public repos.
>
> On 24/01/2013 22:47, "John Rose"  wrote:
>
>>Thanks, Charlie!
>>
>>Yes, feedback makes us happy, especially small-but-representative
>>benchmarks.
>>
>>‹ John
>>
>>On Jan 24, 2013, at 1:21 PM, Charles Oliver Nutter wrote:
>>
>>> I did some builds of hotspot-comp as of this afternoon for y'all to
>>> download. This has the permgen removal, new indy impl + opto, partial
>>> inlining, and other bits and bobs.
>>>
>>> I'm sure the Hotspot guys would appreciate feedback on indy
>>> performance. As far as I know, all the indy opto stuff in this build
>>> is on its way to 7u12, but that window may still be open for
>>> additional patches.
>>>
>>> https://s3.amazonaws.com/openjdk/index.html
>>>
>>> - Charlie
>>> ___
>>> mlvm-dev mailing list
>>> mlvm-dev@openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>>___
>>mlvm-dev mailing list
>>mlvm-dev@openjdk.java.net
>>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


  1   2   3   4   5   6   7   8   >