Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-08 Thread Andrew Haley
On 4/8/19 9:36 AM, Roman Kennke wrote:
> On 4/7/19 7:18 PM, Roman Kennke wrote:
>>> On 4/2/19 10:12 PM, Roman Kennke wrote:
> - No more need for object equals barriers.

 I'm pleased about that. I really hated the AArch64 Shenandoah
 CAS!
>>>
>>> I'm sorry to disappoint you, but the CAS barrier is still needed.
>>> The
>>> memory location may still legally hold a from-space reference, and
>>> comparing that to a to-space reference needs some special care. And
>>> yes, ZGC has a similar problem as far as I know.
>>
>> That's interesting. Could we not simply promote the reference in the
>> reference field we're CASing, then do a normal CAS?
> 
> Yes. But it requires two CASes in a row. Don't know if that is better
> than our loop, which is rarely even entered (normally the CAS-construct 
> goes via fast-path with a single CAS).

I think it's better, yes: it's significantly less complex to promote both
if needed and then do a normal CAS.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-08 Thread Roman Kennke
On 4/7/19 7:18 PM, Roman Kennke wrote:
> > On 4/2/19 10:12 PM, Roman Kennke wrote:
> > > > - No more need for object equals barriers.
> > > 
> > > I'm pleased about that. I really hated the AArch64 Shenandoah
> > > CAS!
> > 
> > I'm sorry to disappoint you, but the CAS barrier is still needed.
> > The
> > memory location may still legally hold a from-space reference, and
> > comparing that to a to-space reference needs some special care. And
> > yes, ZGC has a similar problem as far as I know.
> 
> That's interesting. Could we not simply promote the reference in the
> reference field we're CASing, then do a normal CAS?

Yes. But it requires two CASes in a row. Don't know if that is better
than our loop, which is rarely even entered (normally the CAS-construct 
goes via fast-path with a single CAS).

Roman



Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-08 Thread Andrew Haley
On 4/7/19 7:18 PM, Roman Kennke wrote:
> On 4/2/19 10:12 PM, Roman Kennke wrote:
>>> - No more need for object equals barriers.
>>
>> I'm pleased about that. I really hated the AArch64 Shenandoah CAS!
> 
> I'm sorry to disappoint you, but the CAS barrier is still needed. The
> memory location may still legally hold a from-space reference, and
> comparing that to a to-space reference needs some special care. And
> yes, ZGC has a similar problem as far as I know.

That's interesting. Could we not simply promote the reference in the
reference field we're CASing, then do a normal CAS?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-07 Thread Roman Kennke
On 4/2/19 10:12 PM, Roman Kennke wrote:
> > - No more need for object equals barriers.
> 
> I'm pleased about that. I really hated the AArch64 Shenandoah CAS!

I'm sorry to disappoint you, but the CAS barrier is still needed. The
memory location may still legally hold a from-space reference, and
comparing that to a to-space reference needs some special care. And
yes, ZGC has a similar problem as far as I know.

Roman




Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-07 Thread Andrew Haley
On 4/2/19 10:12 PM, Roman Kennke wrote:
> - No more need for object equals barriers.

I'm pleased about that. I really hated the AArch64 Shenandoah CAS!

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-04 Thread Roman Kennke

The main difference is that instead of ensuring correct invariant when
we store anything into the heap (e.g. read-barrier before reads,
write-barrier before writes, plus a bunch of other stuff), we ensure the
strong invariance on objects when they get loaded, by employing what is
currently our write-barrier.


OK, so how does this work? Sure, the OOP load promotes an object to
tospace, but how do you ensure that the OOP doesn't become stale when
a later phase occurs?


Whenever we start an evacuation phase, we pre-evacuate everything that's 
referenced by stack or registers, and update those stack slots and 
registers. (We did that with the old barrier-scheme too.)


Roman


Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-04 Thread Andrew Haley
On 4/2/19 10:12 PM, Roman Kennke wrote:
> The main difference is that instead of ensuring correct invariant when
> we store anything into the heap (e.g. read-barrier before reads,
> write-barrier before writes, plus a bunch of other stuff), we ensure the
> strong invariance on objects when they get loaded, by employing what is
> currently our write-barrier.

OK, so how does this work? Sure, the OOP load promotes an object to
tospace, but how do you ensure that the OOP doesn't become stale when
a later phase occurs?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-03 Thread Aleksey Shipilev
On 4/3/19 7:13 PM, Roman Kennke wrote:
> Updated webrevs:
> Incremental:
> http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01.diff/
> Full:
> http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01/

Shenandoah parts look good.

-Aleksey



Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-03 Thread Vladimir Kozlov

Good (C2 part).

Thanks,
Vladimir

On 4/3/19 10:13 AM, Roman Kennke wrote:

I don't think it should be part of this cleanup.


Fair enough.
I have run several tests today, and removing the is_Phi() call doesn't seem to 
negatively impact Shenandoah.

Updated webrevs:
Incremental:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01.diff/
Full:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01/

Ok now?

Thanks,
Roman



Please, file separate RFE to push this change with separate review and testing.

Thanks,
Vladimir

On 4/3/19 4:18 AM, Roland Westrelin wrote:


Hi Vladimir,


opto/loopnode.cpp new is_Phi check was added. Please, explain.


When we expand barriers, if we find a null check nearby we move the
barrier close to the null check so there's a better chance of converting
it to an implicit null check. That happens as part of a pass of loop
opts. I think that's where that change comes from but I don't remember
the details. In general we need the control that's assigned to a load to
not be too conservative.

Anyway, that change is not required for correctness. But it looks
reasonable to me.

Roland.



Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-03 Thread Roman Kennke

I don't think it should be part of this cleanup.


Fair enough.
I have run several tests today, and removing the is_Phi() call doesn't 
seem to negatively impact Shenandoah.


Updated webrevs:
Incremental:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01.diff/
Full:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01/

Ok now?

Thanks,
Roman


Please, file separate RFE to push this change with separate review and 
testing.


Thanks,
Vladimir

On 4/3/19 4:18 AM, Roland Westrelin wrote:


Hi Vladimir,


opto/loopnode.cpp new is_Phi check was added. Please, explain.


When we expand barriers, if we find a null check nearby we move the
barrier close to the null check so there's a better chance of converting
it to an implicit null check. That happens as part of a pass of loop
opts. I think that's where that change comes from but I don't remember
the details. In general we need the control that's assigned to a load to
not be too conservative.

Anyway, that change is not required for correctness. But it looks
reasonable to me.

Roland.



Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-03 Thread Vladimir Kozlov

I don't think it should be part of this cleanup.

Please, file separate RFE to push this change with separate review and testing.

Thanks,
Vladimir

On 4/3/19 4:18 AM, Roland Westrelin wrote:


Hi Vladimir,


opto/loopnode.cpp new is_Phi check was added. Please, explain.


When we expand barriers, if we find a null check nearby we move the
barrier close to the null check so there's a better chance of converting
it to an implicit null check. That happens as part of a pass of loop
opts. I think that's where that change comes from but I don't remember
the details. In general we need the control that's assigned to a load to
not be too conservative.

Anyway, that change is not required for correctness. But it looks
reasonable to me.

Roland.



Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-03 Thread Roland Westrelin


Hi Vladimir,

> opto/loopnode.cpp new is_Phi check was added. Please, explain.

When we expand barriers, if we find a null check nearby we move the
barrier close to the null check so there's a better chance of converting
it to an implicit null check. That happens as part of a pass of loop
opts. I think that's where that change comes from but I don't remember
the details. In general we need the control that's assigned to a load to
not be too conservative.

Anyway, that change is not required for correctness. But it looks
reasonable to me.

Roland.


Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-02 Thread Roman Kennke

Hi Vladimir,


This is nice cleanup :)

4294 lines changed: 977 ins; 2841 del; 476 mod


Yeah, right? :-)

First is general question. I don't understand why you need (diagnostic) 
ShenandoahLoadRefBarrier flag if it is new behavior and you can't use 
old one because you removed it. I am definitely missing something here.


This is added for the same purpose that we had e.g. 
+/-ShenandoahWriteBarrier before: in order to selectively disable the 
barrier generation, for testing and diagnostics.



Thank you for thinking about Graal:

 >    ==> good for upcoming Graal (sup)port


:-)


opto/loopnode.cpp new is_Phi check was added. Please, explain.


I'm not sure. I believe Roland did this. I'll let him comment on it.


I don't see other issues in C2 code.


:-)

Thanks,
Roman



Regards,
Vladimir

On 4/2/19 2:12 PM, Roman Kennke wrote:

(I am cross-posting this to build-dev and compiler-dev because this
contains some (trivial-ish) shared build and C2 changes. The C2 changes
are almost all reversals of Shenandoah-specific paths that have been
introduced in initial Shenandoah push.)

I would like to propose that we switch to what we came to call 'load
reference barrier' as new barrier scheme for Shenandoah GC.

The main difference is that instead of ensuring correct invariant when
we store anything into the heap (e.g. read-barrier before reads,
write-barrier before writes, plus a bunch of other stuff), we ensure the
strong invariance on objects when they get loaded, by employing what is
currently our write-barrier.

The reason why I'm proposing it is:
- simpler barrier interface
- easier to get good performance out of it
   ==> good for upcoming Graal (sup)port
- reduced maintenance burden (I intend to backport it all the way)

This has a number of advantages:
- Strong invariant means it's a lot easier to reason about the state of
GC and objects
- Much simpler barrier interface. Infact, a lot of stuff that we added
to barrier interfaces after JDK11 will now become unused: no need for
barriers on primitives, no need for object equality barriers, no need
for resolve barriers, etc. Also, some C2 stuff that we added for
Shenandoah can now be removed again. (Those are what comprise most
shared C2 changes.)
- Optimization is much easier: we currently put barriers 'down low'
close to their uses (which might be inside a hot loop), and then work
hard to optimize barriers upwards, e.g. out of loops. By using
load-ref-barriers, we would place them at the outermost site already.
Look how much code is removed from shenandoahSupport.cpp!
- No more need for object equals barriers.
- No more need for 'resolve' barriers.
- All barriers are now conditional, which opens up opportunity for
further optimization later on.
- we can re-enable the fast JNI getfield stuff
- we no longer need the nmethod initializer that initializes embedded
oops to to-space
- We no longer have the problem to use two registers for 'the same'
value (pre- and post-barrier).

The 'only' optimizations that we do in C2 are:
- Look upwards and see if barrier input indicates we don't actually need
the barrier. Would be the case for: constants, nulls, method parameters,
etc (anything that is not like a load). Even though we insert barriers
after loads, you'd be surprised to see how many loads actually disappear.
- Look downwards to check uses of the barrier. If it doesn't feed into
anything that requires a barrier, we can remove it.

Performance doesn't seem to be negatively impacted at all. Some
benchmarks benefit positively from it.

Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all
of them many times. This patch has baked in shenandoah/jdk for 1.5
months, undergone our rigorous CI, received various bug-fixes, we have
had a close look at the generated code to verify it is sane. jdk/submit
job expected good before push.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8221766
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/

Can I please get reviews for this change?

Roman




Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-02 Thread Vladimir Kozlov

This is nice cleanup :)

4294 lines changed: 977 ins; 2841 del; 476 mod

First is general question. I don't understand why you need (diagnostic) ShenandoahLoadRefBarrier flag if it is new 
behavior and you can't use old one because you removed it. I am definitely missing something here.



Thank you for thinking about Graal:

>==> good for upcoming Graal (sup)port

opto/loopnode.cpp new is_Phi check was added. Please, explain.

I don't see other issues in C2 code.

Regards,
Vladimir

On 4/2/19 2:12 PM, Roman Kennke wrote:

(I am cross-posting this to build-dev and compiler-dev because this
contains some (trivial-ish) shared build and C2 changes. The C2 changes
are almost all reversals of Shenandoah-specific paths that have been
introduced in initial Shenandoah push.)

I would like to propose that we switch to what we came to call 'load
reference barrier' as new barrier scheme for Shenandoah GC.

The main difference is that instead of ensuring correct invariant when
we store anything into the heap (e.g. read-barrier before reads,
write-barrier before writes, plus a bunch of other stuff), we ensure the
strong invariance on objects when they get loaded, by employing what is
currently our write-barrier.

The reason why I'm proposing it is:
- simpler barrier interface
- easier to get good performance out of it
   ==> good for upcoming Graal (sup)port
- reduced maintenance burden (I intend to backport it all the way)

This has a number of advantages:
- Strong invariant means it's a lot easier to reason about the state of
GC and objects
- Much simpler barrier interface. Infact, a lot of stuff that we added
to barrier interfaces after JDK11 will now become unused: no need for
barriers on primitives, no need for object equality barriers, no need
for resolve barriers, etc. Also, some C2 stuff that we added for
Shenandoah can now be removed again. (Those are what comprise most
shared C2 changes.)
- Optimization is much easier: we currently put barriers 'down low'
close to their uses (which might be inside a hot loop), and then work
hard to optimize barriers upwards, e.g. out of loops. By using
load-ref-barriers, we would place them at the outermost site already.
Look how much code is removed from shenandoahSupport.cpp!
- No more need for object equals barriers.
- No more need for 'resolve' barriers.
- All barriers are now conditional, which opens up opportunity for
further optimization later on.
- we can re-enable the fast JNI getfield stuff
- we no longer need the nmethod initializer that initializes embedded
oops to to-space
- We no longer have the problem to use two registers for 'the same'
value (pre- and post-barrier).

The 'only' optimizations that we do in C2 are:
- Look upwards and see if barrier input indicates we don't actually need
the barrier. Would be the case for: constants, nulls, method parameters,
etc (anything that is not like a load). Even though we insert barriers
after loads, you'd be surprised to see how many loads actually disappear.
- Look downwards to check uses of the barrier. If it doesn't feed into
anything that requires a barrier, we can remove it.

Performance doesn't seem to be negatively impacted at all. Some
benchmarks benefit positively from it.

Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all
of them many times. This patch has baked in shenandoah/jdk for 1.5
months, undergone our rigorous CI, received various bug-fixes, we have
had a close look at the generated code to verify it is sane. jdk/submit
job expected good before push.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8221766
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/

Can I please get reviews for this change?

Roman




Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-02 Thread Roman Kennke

Thanks, Erik!

Roman


Build change looks good.

/Erik

On 2019-04-02 14:12, Roman Kennke wrote:

(I am cross-posting this to build-dev and compiler-dev because this
contains some (trivial-ish) shared build and C2 changes. The C2 changes
are almost all reversals of Shenandoah-specific paths that have been
introduced in initial Shenandoah push.)

I would like to propose that we switch to what we came to call 'load
reference barrier' as new barrier scheme for Shenandoah GC.

The main difference is that instead of ensuring correct invariant when
we store anything into the heap (e.g. read-barrier before reads,
write-barrier before writes, plus a bunch of other stuff), we ensure the
strong invariance on objects when they get loaded, by employing what is
currently our write-barrier.

The reason why I'm proposing it is:
- simpler barrier interface
- easier to get good performance out of it
   ==> good for upcoming Graal (sup)port
- reduced maintenance burden (I intend to backport it all the way)

This has a number of advantages:
- Strong invariant means it's a lot easier to reason about the state of
GC and objects
- Much simpler barrier interface. Infact, a lot of stuff that we added
to barrier interfaces after JDK11 will now become unused: no need for
barriers on primitives, no need for object equality barriers, no need
for resolve barriers, etc. Also, some C2 stuff that we added for
Shenandoah can now be removed again. (Those are what comprise most
shared C2 changes.)
- Optimization is much easier: we currently put barriers 'down low'
close to their uses (which might be inside a hot loop), and then work
hard to optimize barriers upwards, e.g. out of loops. By using
load-ref-barriers, we would place them at the outermost site already.
Look how much code is removed from shenandoahSupport.cpp!
- No more need for object equals barriers.
- No more need for 'resolve' barriers.
- All barriers are now conditional, which opens up opportunity for
further optimization later on.
- we can re-enable the fast JNI getfield stuff
- we no longer need the nmethod initializer that initializes embedded
oops to to-space
- We no longer have the problem to use two registers for 'the same'
value (pre- and post-barrier).

The 'only' optimizations that we do in C2 are:
- Look upwards and see if barrier input indicates we don't actually need
the barrier. Would be the case for: constants, nulls, method parameters,
etc (anything that is not like a load). Even though we insert barriers
after loads, you'd be surprised to see how many loads actually disappear.
- Look downwards to check uses of the barrier. If it doesn't feed into
anything that requires a barrier, we can remove it.

Performance doesn't seem to be negatively impacted at all. Some
benchmarks benefit positively from it.

Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all
of them many times. This patch has baked in shenandoah/jdk for 1.5
months, undergone our rigorous CI, received various bug-fixes, we have
had a close look at the generated code to verify it is sane. jdk/submit
job expected good before push.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8221766
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/

Can I please get reviews for this change?

Roman




Re: RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-02 Thread Erik Joelsson

Build change looks good.

/Erik

On 2019-04-02 14:12, Roman Kennke wrote:

(I am cross-posting this to build-dev and compiler-dev because this
contains some (trivial-ish) shared build and C2 changes. The C2 changes
are almost all reversals of Shenandoah-specific paths that have been
introduced in initial Shenandoah push.)

I would like to propose that we switch to what we came to call 'load
reference barrier' as new barrier scheme for Shenandoah GC.

The main difference is that instead of ensuring correct invariant when
we store anything into the heap (e.g. read-barrier before reads,
write-barrier before writes, plus a bunch of other stuff), we ensure the
strong invariance on objects when they get loaded, by employing what is
currently our write-barrier.

The reason why I'm proposing it is:
- simpler barrier interface
- easier to get good performance out of it
   ==> good for upcoming Graal (sup)port
- reduced maintenance burden (I intend to backport it all the way)

This has a number of advantages:
- Strong invariant means it's a lot easier to reason about the state of
GC and objects
- Much simpler barrier interface. Infact, a lot of stuff that we added
to barrier interfaces after JDK11 will now become unused: no need for
barriers on primitives, no need for object equality barriers, no need
for resolve barriers, etc. Also, some C2 stuff that we added for
Shenandoah can now be removed again. (Those are what comprise most
shared C2 changes.)
- Optimization is much easier: we currently put barriers 'down low'
close to their uses (which might be inside a hot loop), and then work
hard to optimize barriers upwards, e.g. out of loops. By using
load-ref-barriers, we would place them at the outermost site already.
Look how much code is removed from shenandoahSupport.cpp!
- No more need for object equals barriers.
- No more need for 'resolve' barriers.
- All barriers are now conditional, which opens up opportunity for
further optimization later on.
- we can re-enable the fast JNI getfield stuff
- we no longer need the nmethod initializer that initializes embedded
oops to to-space
- We no longer have the problem to use two registers for 'the same'
value (pre- and post-barrier).

The 'only' optimizations that we do in C2 are:
- Look upwards and see if barrier input indicates we don't actually need
the barrier. Would be the case for: constants, nulls, method parameters,
etc (anything that is not like a load). Even though we insert barriers
after loads, you'd be surprised to see how many loads actually disappear.
- Look downwards to check uses of the barrier. If it doesn't feed into
anything that requires a barrier, we can remove it.

Performance doesn't seem to be negatively impacted at all. Some
benchmarks benefit positively from it.

Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all
of them many times. This patch has baked in shenandoah/jdk for 1.5
months, undergone our rigorous CI, received various bug-fixes, we have
had a close look at the generated code to verify it is sane. jdk/submit
job expected good before push.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8221766
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/

Can I please get reviews for this change?

Roman




RFR: JDK-8221766: Load-reference barriers for Shenandoah

2019-04-02 Thread Roman Kennke
(I am cross-posting this to build-dev and compiler-dev because this
contains some (trivial-ish) shared build and C2 changes. The C2 changes
are almost all reversals of Shenandoah-specific paths that have been
introduced in initial Shenandoah push.)

I would like to propose that we switch to what we came to call 'load
reference barrier' as new barrier scheme for Shenandoah GC.

The main difference is that instead of ensuring correct invariant when
we store anything into the heap (e.g. read-barrier before reads,
write-barrier before writes, plus a bunch of other stuff), we ensure the
strong invariance on objects when they get loaded, by employing what is
currently our write-barrier.

The reason why I'm proposing it is:
- simpler barrier interface
- easier to get good performance out of it
  ==> good for upcoming Graal (sup)port
- reduced maintenance burden (I intend to backport it all the way)

This has a number of advantages:
- Strong invariant means it's a lot easier to reason about the state of
GC and objects
- Much simpler barrier interface. Infact, a lot of stuff that we added
to barrier interfaces after JDK11 will now become unused: no need for
barriers on primitives, no need for object equality barriers, no need
for resolve barriers, etc. Also, some C2 stuff that we added for
Shenandoah can now be removed again. (Those are what comprise most
shared C2 changes.)
- Optimization is much easier: we currently put barriers 'down low'
close to their uses (which might be inside a hot loop), and then work
hard to optimize barriers upwards, e.g. out of loops. By using
load-ref-barriers, we would place them at the outermost site already.
Look how much code is removed from shenandoahSupport.cpp!
- No more need for object equals barriers.
- No more need for 'resolve' barriers.
- All barriers are now conditional, which opens up opportunity for
further optimization later on.
- we can re-enable the fast JNI getfield stuff
- we no longer need the nmethod initializer that initializes embedded
oops to to-space
- We no longer have the problem to use two registers for 'the same'
value (pre- and post-barrier).

The 'only' optimizations that we do in C2 are:
- Look upwards and see if barrier input indicates we don't actually need
the barrier. Would be the case for: constants, nulls, method parameters,
etc (anything that is not like a load). Even though we insert barriers
after loads, you'd be surprised to see how many loads actually disappear.
- Look downwards to check uses of the barrier. If it doesn't feed into
anything that requires a barrier, we can remove it.

Performance doesn't seem to be negatively impacted at all. Some
benchmarks benefit positively from it.

Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all
of them many times. This patch has baked in shenandoah/jdk for 1.5
months, undergone our rigorous CI, received various bug-fixes, we have
had a close look at the generated code to verify it is sane. jdk/submit
job expected good before push.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8221766
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/

Can I please get reviews for this change?

Roman