Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]
On 03/11/21 8:50 am, Jaikiran Pai wrote: Hello Alan, On 02/11/21 5:30 pm, Alan Bateman wrote: On 02/11/2021 06:38, Jaikiran Pai wrote: : Perhaps run 1 writing the hash code of each of the boot modules' descriptor into a file and then run 2 reading that same file to compare the hash codes would be one way to do it. But I think that would just make this test more complex, which I think is avoidable. I'm not sure that we really need this. There are several jlink tests that check reproducibility, we think one of them is failing intermittently with this issue already. So maybe we should leave build reproducibility to the jlink tests and expand them as needed. For ModuleDescriptor::hashCode then I was hoping we keep it simple and just test that the hashCode of the descriptors for modules in the boot layer matches the hashCode computed from re-parsing the module-info files, e.g. something simple like this: for (Module module : ModuleLayer.boot().modules()) { System.out.println(module); ModuleDescriptor descriptor1 = module.getDescriptor(); ModuleDescriptor descriptor2; try (InputStream in = module.getResourceAsStream("module-info.class")) { descriptor2 = ModuleDescriptor.read(in); } assertEquals(descriptor1, descriptor2); assertEquals(descriptor1.hashCode(), descriptor2.hashCode()); assertEquals(descriptor1.compareTo(descriptor2), 0); } It run can twice, with with the default, the other with -Xshare:off. Sounds reasonable. I've updated the PR to simplify the test and run it once with default CDS and once with -Xshare:off. Given this simplification, it now allows me to use testng for these comparisons, so I've moved it from regular "main" to testng test case. I hope that's OK. Test continues to pass with these changes and fails (as expected) when the fix in the source code of ModuleDescriptor class is rolled back. -Jaikiran
Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]
Hello Alan, On 02/11/21 5:30 pm, Alan Bateman wrote: On 02/11/2021 06:38, Jaikiran Pai wrote: : Perhaps run 1 writing the hash code of each of the boot modules' descriptor into a file and then run 2 reading that same file to compare the hash codes would be one way to do it. But I think that would just make this test more complex, which I think is avoidable. I'm not sure that we really need this. There are several jlink tests that check reproducibility, we think one of them is failing intermittently with this issue already. So maybe we should leave build reproducibility to the jlink tests and expand them as needed. For ModuleDescriptor::hashCode then I was hoping we keep it simple and just test that the hashCode of the descriptors for modules in the boot layer matches the hashCode computed from re-parsing the module-info files, e.g. something simple like this: for (Module module : ModuleLayer.boot().modules()) { System.out.println(module); ModuleDescriptor descriptor1 = module.getDescriptor(); ModuleDescriptor descriptor2; try (InputStream in = module.getResourceAsStream("module-info.class")) { descriptor2 = ModuleDescriptor.read(in); } assertEquals(descriptor1, descriptor2); assertEquals(descriptor1.hashCode(), descriptor2.hashCode()); assertEquals(descriptor1.compareTo(descriptor2), 0); } It run can twice, with with the default, the other with -Xshare:off. Sounds reasonable. I've updated the PR to simplify the test and run it once with default CDS and once with -Xshare:off. Given this simplification, it now allows me to use testng for these comparisons, so I've moved it from regular "main" to testng test case. I hope that's OK. One other thought on the change to ModuleDescriptor is that we could drop the modifiers from the computation of the hashCode of ModuleDescriptor, Requires, Exports, Opens. Summing the ordinals is okay but it doesn't give a good spread if you really have modules that have everything the same except for the modifiers then it doesn't really help. I gave it a bit of thought. However I let the current PR to continue to use ordinals for the hash code computation. The reason for that is this statement on the Object.hashCode() method which says: "the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables." So if two module descriptors have all components equals() except for their modifiers and if we don't use the modifiers in our hashCode() computation, then they end up producing the same hashCode() which won't violate the spec but will not satisfy the above statement about performance. So I decided to stick with using the ordinal modifiers in the hashCode() computation. However, I'm not an expert on the performance aspects of the Collections and how much using the modifiers for hashCode() will improve the performance in this case. Perhaps that's what you meant by it not giving a good (enough) spread? If so, do let me know and I'll go ahead and update the PR to remove using the modifiers in the hashCode() computation and also update the javadoc of each of those hashCode() methods which state that the modifiers are used in the hashCode() computation. -Jaikiran
Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]
On 02/11/2021 06:38, Jaikiran Pai wrote: : Perhaps run 1 writing the hash code of each of the boot modules' descriptor into a file and then run 2 reading that same file to compare the hash codes would be one way to do it. But I think that would just make this test more complex, which I think is avoidable. I'm not sure that we really need this. There are several jlink tests that check reproducibility, we think one of them is failing intermittently with this issue already. So maybe we should leave build reproducibility to the jlink tests and expand them as needed. For ModuleDescriptor::hashCode then I was hoping we keep it simple and just test that the hashCode of the descriptors for modules in the boot layer matches the hashCode computed from re-parsing the module-info files, e.g. something simple like this: for (Module module : ModuleLayer.boot().modules()) { System.out.println(module); ModuleDescriptor descriptor1 = module.getDescriptor(); ModuleDescriptor descriptor2; try (InputStream in = module.getResourceAsStream("module-info.class")) { descriptor2 = ModuleDescriptor.read(in); } assertEquals(descriptor1, descriptor2); assertEquals(descriptor1.hashCode(), descriptor2.hashCode()); assertEquals(descriptor1.compareTo(descriptor2), 0); } It run can twice, with with the default, the other with -Xshare:off. One other thought on the change to ModuleDescriptor is that we could drop the modifiers from the computation of the hashCode of ModuleDescriptor, Requires, Exports, Opens. Summing the ordinals is okay but it doesn't give a good spread if you really have modules that have everything the same except for the modifiers then it doesn't really help. -Alan
Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]
On 02/11/21 10:29 am, Jaikiran Pai wrote: Hello Alan, On 01/11/21 9:22 pm, Alan Bateman wrote: On Mon, 1 Nov 2021 03:10:45 GMT, Jaikiran Pai wrote: Can I please get a review for this change which fixes the issue reported in https://bugs.openjdk.java.net/browse/JDK-8275509? The `ModuleDescriptor.hashCode()` method uses the hash code of its various components to compute the final hash code. While doing so it ends up calling hashCode() on (collection of) various `enum` types. Since the hashCode() of enum types is generated from a call to `java.lang.Object.hashCode()`, their value isn't guaranteed to stay the same across multiple JVM runs. The commit here proposes to use the ordinal of the enum as the hash code. As Alan notes in the mailing list discussion, any changes to the ordinal of the enum (either due to addition of new value, removal of a value or just reordering existing values, all of which I think will be rare in these specific enum types) isn't expected to produce the same hash code across those changed runtimes and that should be okay. The rest of the ModuleDescriptor.hashCode() has been reviewed too and apart from calls to the enum types hashCode(), rest of the calls in that implementation, either directly or indirectly end up as calls on `java.lang.String.hashCode()` and as such aren't expected to cause non-deterministic values. A new jtreg test has been added to reproduce this issue and verify the fix. Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: better method name in test class Thanks for the update to the test. Instead of launching 50 VMs, what would you think about tossing the use of ProcessTools and just changing the test description so that the test runs twice, once with the default, the second with CDS disabled, as, in: @run ModuleDescriptorHashCodeTest @run main/othervm -Xshare:off ModuleDescriptorHashCodeTest When I started off with this test, I had thought of a similar construct. However, in order to compare the hash code value generated across JVM runs (i.e. the one generated in @run 1 against the one generated in @run 2), I would need to somehow hold on to that dynamic value/result for comparison. Using the @run construct wouldn't allow me to do that. So I decided to use the ProcessTools library to have more control over it. If I missed some way to still use @run for such a test, do let me know, I'll change it accordingly. Perhaps run 1 writing the hash code of each of the boot modules' descriptor into a file and then run 2 reading that same file to compare the hash codes would be one way to do it. But I think that would just make this test more complex, which I think is avoidable. -Jaikiran
Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]
Hello Alan, On 01/11/21 9:22 pm, Alan Bateman wrote: On Mon, 1 Nov 2021 03:10:45 GMT, Jaikiran Pai wrote: Can I please get a review for this change which fixes the issue reported in https://bugs.openjdk.java.net/browse/JDK-8275509? The `ModuleDescriptor.hashCode()` method uses the hash code of its various components to compute the final hash code. While doing so it ends up calling hashCode() on (collection of) various `enum` types. Since the hashCode() of enum types is generated from a call to `java.lang.Object.hashCode()`, their value isn't guaranteed to stay the same across multiple JVM runs. The commit here proposes to use the ordinal of the enum as the hash code. As Alan notes in the mailing list discussion, any changes to the ordinal of the enum (either due to addition of new value, removal of a value or just reordering existing values, all of which I think will be rare in these specific enum types) isn't expected to produce the same hash code across those changed runtimes and that should be okay. The rest of the ModuleDescriptor.hashCode() has been reviewed too and apart from calls to the enum types hashCode(), rest of the calls in that implementation, either directly or indirectly end up as calls on `java.lang.String.hashCode()` and as such aren't expected to cause non-deterministic values. A new jtreg test has been added to reproduce this issue and verify the fix. Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: better method name in test class Thanks for the update to the test. Instead of launching 50 VMs, what would you think about tossing the use of ProcessTools and just changing the test description so that the test runs twice, once with the default, the second with CDS disabled, as, in: @run ModuleDescriptorHashCodeTest @run main/othervm -Xshare:off ModuleDescriptorHashCodeTest When I started off with this test, I had thought of a similar construct. However, in order to compare the hash code value generated across JVM runs (i.e. the one generated in @run 1 against the one generated in @run 2), I would need to somehow hold on to that dynamic value/result for comparison. Using the @run construct wouldn't allow me to do that. So I decided to use the ProcessTools library to have more control over it. If I missed some way to still use @run for such a test, do let me know, I'll change it accordingly. The tests are run by many people every day, they are run continuously on all platforms in several CI systems. So I think you'll get much more variability that way rather than launching 50 VMs. I have updated the PR to reduce the runs from 50 to just 2. Once with default CDS and once with CDS disabled. I also feel like the test could be re-written to test the module descriptors of all modules in the boot layer. That would avoid having a dependency on the java.sql module as it would be testing every module. I think that's a good point and will make this test case cover more modules. I've updated the PR to now cover all these boot layer modules. -Jaikiran
Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]
On Mon, 1 Nov 2021 03:10:45 GMT, Jaikiran Pai wrote: >> Can I please get a review for this change which fixes the issue reported in >> https://bugs.openjdk.java.net/browse/JDK-8275509? >> >> The `ModuleDescriptor.hashCode()` method uses the hash code of its various >> components to compute the final hash code. While doing so it ends up calling >> hashCode() on (collection of) various `enum` types. Since the hashCode() of >> enum types is generated from a call to `java.lang.Object.hashCode()`, their >> value isn't guaranteed to stay the same across multiple JVM runs. >> >> The commit here proposes to use the ordinal of the enum as the hash code. As >> Alan notes in the mailing list discussion, any changes to the ordinal of the >> enum (either due to addition of new value, removal of a value or just >> reordering existing values, all of which I think will be rare in these >> specific enum types) isn't expected to produce the same hash code across >> those changed runtimes and that should be okay. >> >> The rest of the ModuleDescriptor.hashCode() has been reviewed too and apart >> from calls to the enum types hashCode(), rest of the calls in that >> implementation, either directly or indirectly end up as calls on >> `java.lang.String.hashCode()` and as such aren't expected to cause >> non-deterministic values. >> >> A new jtreg test has been added to reproduce this issue and verify the fix. > > Jaikiran Pai has updated the pull request incrementally with one additional > commit since the last revision: > > better method name in test class Thanks for the update to the test. Instead of launching 50 VMs, what would you think about tossing the use of ProcessTools and just changing the test description so that the test runs twice, once with the default, the second with CDS disabled, as, in: @run ModuleDescriptorHashCodeTest @run main/othervm -Xshare:off ModuleDescriptorHashCodeTest The tests are run by many people every day, they are run continuously on all platforms in several CI systems. So I think you'll get much more variability that way rather than launching 50 VMs. I also feel like the test could be re-written to test the module descriptors of all modules in the boot layer. That would avoid having a dependency on the java.sql module as it would be testing every module. - PR: https://git.openjdk.java.net/jdk/pull/6078
Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]
> Can I please get a review for this change which fixes the issue reported in > https://bugs.openjdk.java.net/browse/JDK-8275509? > > The `ModuleDescriptor.hashCode()` method uses the hash code of its various > components to compute the final hash code. While doing so it ends up calling > hashCode() on (collection of) various `enum` types. Since the hashCode() of > enum types is generated from a call to `java.lang.Object.hashCode()`, their > value isn't guaranteed to stay the same across multiple JVM runs. > > The commit here proposes to use the ordinal of the enum as the hash code. As > Alan notes in the mailing list discussion, any changes to the ordinal of the > enum (either due to addition of new value, removal of a value or just > reordering existing values, all of which I think will be rare in these > specific enum types) isn't expected to produce the same hash code across > those changed runtimes and that should be okay. > > The rest of the ModuleDescriptor.hashCode() has been reviewed too and apart > from calls to the enum types hashCode(), rest of the calls in that > implementation, either directly or indirectly end up as calls on > `java.lang.String.hashCode()` and as such aren't expected to cause > non-deterministic values. > > A new jtreg test has been added to reproduce this issue and verify the fix. Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: better method name in test class - Changes: - all: https://git.openjdk.java.net/jdk/pull/6078/files - new: https://git.openjdk.java.net/jdk/pull/6078/files/e8f19d82..dee4197f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6078&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6078&range=07-08 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6078.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6078/head:pull/6078 PR: https://git.openjdk.java.net/jdk/pull/6078