Re: [External] : Re: JDK-8275509: (jlink) SystemModulesPlugin generates a jdk.internal.module.SystemModules$all.class which isn't reproducible

2021-11-07 Thread Jaikiran Pai

Hello Ioi,

On 02/11/21 12:13 am, Ioi Lam wrote:

Hi Jaikiran,

Thanks for writing the test case to explore the problems in this area.

Please see my comments below:
...

Generally speaking, CDS has two levels of archiving:

[1] archiving class metadata -- classes in the 
$JAVA_HOME/lib/classlist are considered to be frequently loaded 
classes. They are parsed from classfiles and stored into the CDS 
archive. At run time, instead of parsing the classes from classfiles, 
the VM directly use the pre-parsed version of these classes (as 
InstanceKlass* in C++).


At runtime, all such pre-parsed classes are initially in the "loaded" 
state. This means their static constructors will be executed when 
these classes are referenced for the first time. So as far as Java 
semantic is concerned, there's no difference between a pre-parsed 
class vs a class loaded from classfile.


E.g, the examples of loggers in static initializers will be executed 
at runtime.


[2] archiving heap objects

As shown in your test, we cannot arbitrarily archive the static fields 
that were initialized during -Xshare:dump, because they may have 
environment dependency.


The strategy used by CDS is to archive only a few static fields in a 
small number of carefully hand-picked system classes. You can see the 
list in


https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/hotspot/share/cds/heapShared.cpp#L97 



Thank you for that link. That helped. So essentially even though the 
list of classes used for archiving class metadata isn't very tightly 
controlled, the list of objects which are archived in the heap is much 
more selective.


The reason why my PoC ended up reproducing this issue is because it just 
so happened that I selected a class (ModuleDescriptor) which 
(indirectly) is hand-picked in that list of classes that can end up in 
the archived heap.


These static fields are stored into the CDS archive. At run time, 
these fields are essentially copied into the Java heap, and then 
picked up by code like this:


https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/java.base/share/classes/jdk/internal/module/ModuleBootstrap.java#L163 



    public static ModuleLayer boot() {
    Counters.start();

    ModuleLayer bootLayer;
    ArchivedBootLayer archivedBootLayer = ArchivedBootLayer.get();
    if (archivedBootLayer != null) {
    assert canUseArchivedBootLayer();
    bootLayer = archivedBootLayer.bootLayer();
    BootLoader.getUnnamedModule(); // trigger  of 
BootLoader.
CDS.defineArchivedModules(ClassLoaders.platformClassLoader(), 
ClassLoaders.appClassLoader());


    // assume boot layer has at least one module providing a 
service

    // that is mapped to the application class loader.
    JLA.bindToLoader(bootLayer, ClassLoaders.appClassLoader());
    } else {
    bootLayer = boot2();
    }

In the case of the module graph, we remove things that depend on the 
environment (such as CLASSPATH)


https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/hotspot/share/cds/heapShared.cpp#L190 



The remaining parts of the archived module graph only depend on the 
following system properties:


    private static boolean canUseArchivedBootLayer() {
    return getProperty("jdk.module.upgrade.path") == null &&
   getProperty("jdk.module.path") == null &&
   getProperty("jdk.module.patch.0") == null &&   // 
--patch-module
   getProperty("jdk.module.main") == null &&  // 
--module
   getProperty("jdk.module.addmods.0") == null &&    // 
--add-modules
   getProperty("jdk.module.limitmods") == null && // 
--limit-modules
   getProperty("jdk.module.addreads.0") == null &&    // 
--add-reads
   getProperty("jdk.module.addexports.0") == null &&  // 
--add-exports
   getProperty("jdk.module.addopens.0") == null; // 
--add-opens

    }

As a result, we will invalidate the archived module graph if these 
properties differ between dump time and run time. The Java code above 
only asserts that the check has already been done. The actual check is 
done in here:


https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/hotspot/share/runtime/arguments.cpp#L1339 



Understood.



Am I misunderstanding the severity of this issue or is this serious 
enough that -Xshare:off should be default (or heap archiving disabled 
somehow by default till this is fixed) to prevent issues like these 
which can at the minimal be hard to debug bugs and on the extreme end 
perhaps leak things from the build server where the JDK was built? I 
guess it all boils down to which exact classes get 
replaced/mapped/copied over from the heap archive? Is there a 
definitive list that can be derived in the current JDK?




I believe the currently implementation is still safe 

Re: [External] : Re: JDK-8275509: (jlink) SystemModulesPlugin generates a jdk.internal.module.SystemModules$all.class which isn't reproducible

2021-11-01 Thread Ioi Lam

Hi Jaikiran,

Thanks for writing the test case to explore the problems in this area.

Please see my comments below:

On 10/29/21 5:55 AM, Jaikiran Pai wrote:

Hello Ioi (and others),

On 22/10/21 1:39 pm, Jaikiran Pai wrote:

Hello Ioi,

On 22/10/21 12:29 pm, Ioi Lam wrote:



On 10/21/21 9:09 PM, Jaikiran Pai wrote:

Hello Ioi,




This is my initial analysis of the problem.

>>> Can anyone think of similar problems that may happen 
elsewhere?


The static constructors of enum classes are executed at both CDS 
dump time and run time. E.g.,


    public enum Modifier {
    OPEN
    }

The  method essentially does this:

    public static final Modifier OPEN = new Modifier("OPEN");

If a reference of Modifier.OPEN is stored inside the CDS archived 
heap during dump time, it will be different than the value of 
Modifier.OPEN that is re-created at runtime by the execution of 
Modifier.


I have almost next to nothing knowledge about CDS internals. My 
only understanding of it is based on some documentation that I have 
read. One of them being this one 
https://docs.oracle.com/en/java/javase/17/vm/class-data-sharing.html#GUID-7EAA3411-8CF0-4D19-BD05-DF5E1780AA91.


Based on that documentation (and other similar ones), it was my 
understanding that CDS was meant to store/share class "metadata" 
like it states in that doc:


"When the JVM starts, the shared archive is memory-mapped to allow 
sharing of read-only JVM metadata for these classes among multiple 
JVM processes."


But from what you explain in that enum example, it looks like it 
also stores class instance data that is computed at build time on 
the host where the JDK image was generated? Did I understand it 
correctly? Is this only for enums or does it also store the static 
initialization data of "class" types too? If it does store the 
static init data of class types too, then wouldn't such data be 
host/build time specific and as such the classes that need to be 
enrolled into the default CDS archive of the JDK should be very 
selective (by reviewing what they do in their static init)? Like I 
said, I haven't looked into this in detail so perhaps it already is 
selective in the JDK build?




Hi Jaikiran,


Thank you very much for the detailed response.

CDS also has the ability to archive Java heap object. Since 
https://bugs.openjdk.java.net/browse/JDK-8244778 , we have archived 
the entire module graph to improve start-up time. At run time, the 
module graph (as well as other archived heap objects) are loaded 
from the CDS archive and put into the Java heap (either through 
memory mapping or copying).


That is interesting and something that I hadn't known.



You can see the related code in 
jdk.internal.module.ModuleBootstrap::boot()
I just had a look at it and it's quite elaborate and it'll take a me 
while to fully grasp it (if at all) given its understandable complexity.


When the module system has started up, the module graph will 
reference a copy of the OPEN enum object that was created as part of 
the archive. However, the Modifier. will also be executed at 
VM start-up, causing a second copy of the OPEN enum object to be 
stored into the static field Modified::OPEN.


Thank you for that detail. That helps me understand this a bit more 
(and opens a few questions). To be clear - the VM startup code which 
creates that other copy, ends up creating that copy because that 
piece of initialization happens even before the module system has 
fully started up and created those references from the archived 
state? Otherwise, the classloaders I believe would be smart enough to 
not run that static init again, had the module system with that graph 
from the archived state been fully "up"?


So would this mean that this not just impacts enums but essentially 
every class referenced within the module system (of just boot layer?) 
that has state which is initialized during static init? To be more 
precise, consider the very common example of loggers which are 
typically static initialized and stored in a static (final) field:


private static final java.util.logger.Logger logger = 
Logger.getLogger(SomeClass.class);


If the boot layer module graph has any classes which has state like 
this, it would then mean that if such classes do get initialized very 
early on during VM startup, then they too are impacted and the module 
graph holding instances of such classes will end up using a different 
instance for such fields as compared to the rest of the application 
code?


In essence, such classes which get accessed early (before module 
system with the archived graph is "up") during VM startup can end up 
_virtually_ having their static initialization run twice (I 
understand it won't be run twice, but that's the end result, isn't it)?


I was really curious why this was only applicable to enums and why 
other static initialization (like the one I explained above) wasn't 
impacted. So I decided to do a small PoC. To come up with an