Such an assumption (in-order execution of statements) would be invalid even 
with the current memory model. There's nothing to stop the compilers from 
re-ordering the adds and multiplies so that they fill each other's pipeline 
delays.

So I don't think AVX2 brings anything new to the table in terms of perturbing 
the memory model.

----- Original Message -----
From: "John Platts" <john_pla...@hotmail.com>
To: jdk8-dev@openjdk.java.net
Sent: Thursday, December 8, 2011 9:25:41 AM
Subject: Optimizing arithmetic operations on processors with AVX2 support


Here is an example of a class with an operation that can be optimized on a 
processor with AVX2 support:class ExampleClass {    public void 
ExampleOperation(ExampleClass y) {        a += y.a;        b *= y.b;        c 
+= y.c;        d += y.d;        e += y.e;        f *= y.f;        g *= y.g;     
   h *= y.h;    }
    private int a;    private int b;    private int c;    private int d;    
private int e;    private int f;    private int g;    private int h;}
The AVX2 instruction set includes gather instructions that can be used to read 
from primitive fields that are not contiguous to each other. The AVX2 
instruction set will be implemented on the Intel Haswell microarchitecture 
processors.
In the example above, a JVM running on a processor with the AVX2 instruction 
set can optimize the ExampleOperation method as follows:- Reading the a, c, d, 
and e fields of both this and y using the VPGATHERDD instruction.- Performing 
the 4 addition operations simultaneously using the PADDD instruction.- Store 
the result of the addition operations in a, c, d, and e using the PEXTRD 
instruction.- Reading the b, f, g, and h fields of both this and y using the 
VPGATHERDD instruction.- Performing the 4 multiplication operations 
simultaneously using the PMULLD instruction.- Store the result of the 
multiplication operations in b, f, g, and h using the PEXTRD instruction.
This optimization is perfectly legal under the Java Memory Model, since there 
are no volatile reads or volatile writes. However, this optimization would be 
illegal if a, b, c, d, e, f, g, or h were declared as volatile fields. This 
optimization must also respect constraints imposed by synchronized blocks, 
volatile reads, volatile writes, method calls, data dependencies, and strictfp 
semantics. This optimization would also need to be disabled if the method is 
being debugged by a Java debugger, as the Java debugger can step through each 
operation individually.
The point I am trying to illustrate is that Java programmers should not assume 
that the arithmetic operations performed by the ExampleOperation method are not 
guaranteed to execute in the sequence shown in the source code. This example 
also illustrates the importance of properly synchronization. Will this 
optimization get implemented in the Hotspot VM in the future?                   
                     

Reply via email to