Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

2021-11-02 Thread Jaikiran Pai



On 03/11/21 8:50 am, Jaikiran Pai wrote:

Hello Alan,

On 02/11/21 5:30 pm, Alan Bateman wrote:

On 02/11/2021 06:38, Jaikiran Pai wrote:

:

Perhaps run 1 writing the hash code of each of the boot modules' 
descriptor into a file and then run 2 reading that same file to 
compare the hash codes would be one way to do it. But I think that 
would just make this test more complex, which I think is avoidable.


I'm not sure that we really need this. There are several jlink tests 
that check reproducibility, we think one of them is failing 
intermittently with this issue already. So maybe we should leave 
build reproducibility to the jlink tests and expand them as needed.


For ModuleDescriptor::hashCode then I was hoping we keep it simple 
and just test that the hashCode of the descriptors for modules in the 
boot layer matches the hashCode computed from re-parsing the 
module-info files, e.g. something simple like this:


  for (Module module : ModuleLayer.boot().modules()) {
    System.out.println(module);
    ModuleDescriptor descriptor1 = module.getDescriptor();
    ModuleDescriptor descriptor2;
    try (InputStream in = 
module.getResourceAsStream("module-info.class")) {

    descriptor2 = ModuleDescriptor.read(in);
    }
    assertEquals(descriptor1, descriptor2);
    assertEquals(descriptor1.hashCode(), 
descriptor2.hashCode());

    assertEquals(descriptor1.compareTo(descriptor2), 0);
    }

It run can twice, with with the default, the other with -Xshare:off.


Sounds reasonable. I've updated the PR to simplify the test and run it 
once with default CDS and once with -Xshare:off. Given this 
simplification, it now allows me to use testng for these comparisons, 
so I've moved it from regular "main" to testng test case. I hope 
that's OK.


Test continues to pass with these changes and fails (as expected) when 
the fix in the source code of ModuleDescriptor class is rolled back.


-Jaikiran



Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

2021-11-02 Thread Jaikiran Pai

Hello Alan,

On 02/11/21 5:30 pm, Alan Bateman wrote:

On 02/11/2021 06:38, Jaikiran Pai wrote:

:

Perhaps run 1 writing the hash code of each of the boot modules' 
descriptor into a file and then run 2 reading that same file to 
compare the hash codes would be one way to do it. But I think that 
would just make this test more complex, which I think is avoidable.


I'm not sure that we really need this. There are several jlink tests 
that check reproducibility, we think one of them is failing 
intermittently with this issue already. So maybe we should leave build 
reproducibility to the jlink tests and expand them as needed.


For ModuleDescriptor::hashCode then I was hoping we keep it simple and 
just test that the hashCode of the descriptors for modules in the boot 
layer matches the hashCode computed from re-parsing the module-info 
files, e.g. something simple like this:


  for (Module module : ModuleLayer.boot().modules()) {
    System.out.println(module);
    ModuleDescriptor descriptor1 = module.getDescriptor();
    ModuleDescriptor descriptor2;
    try (InputStream in = 
module.getResourceAsStream("module-info.class")) {

    descriptor2 = ModuleDescriptor.read(in);
    }
    assertEquals(descriptor1, descriptor2);
    assertEquals(descriptor1.hashCode(), descriptor2.hashCode());
    assertEquals(descriptor1.compareTo(descriptor2), 0);
    }

It run can twice, with with the default, the other with -Xshare:off.


Sounds reasonable. I've updated the PR to simplify the test and run it 
once with default CDS and once with -Xshare:off. Given this 
simplification, it now allows me to use testng for these comparisons, so 
I've moved it from regular "main" to testng test case. I hope that's OK.




One other thought on the change to ModuleDescriptor is that we could 
drop the modifiers from the computation of the hashCode of 
ModuleDescriptor, Requires, Exports, Opens. Summing the ordinals is 
okay but it doesn't give a good spread if you really have modules that 
have everything the same except for the modifiers then it doesn't 
really help.


I gave it a bit of thought. However I let the current PR to continue to 
use ordinals for the hash code computation. The reason for that is this 
statement on the Object.hashCode() method which says:


"the programmer should be aware that producing distinct integer results 
for unequal objects may improve the performance of hash tables."


So if two module descriptors have all components equals() except for 
their modifiers and if we don't use the modifiers in our hashCode() 
computation, then they end up producing the same hashCode() which won't 
violate the spec but will not satisfy the above statement about performance.


So I decided to stick with using the ordinal modifiers in the hashCode() 
computation. However, I'm not an expert on the performance aspects of 
the Collections and how much using the modifiers for hashCode() will 
improve the performance in this case. Perhaps that's what you meant by 
it not giving a good (enough) spread? If so, do let me know and I'll go 
ahead and update the PR to remove using the modifiers in the hashCode() 
computation and also update the javadoc of each of those hashCode() 
methods which state that the modifiers are used in the hashCode() 
computation.


-Jaikiran



Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

2021-11-02 Thread Alan Bateman

On 02/11/2021 06:38, Jaikiran Pai wrote:

:

Perhaps run 1 writing the hash code of each of the boot modules' 
descriptor into a file and then run 2 reading that same file to 
compare the hash codes would be one way to do it. But I think that 
would just make this test more complex, which I think is avoidable.


I'm not sure that we really need this. There are several jlink tests 
that check reproducibility, we think one of them is failing 
intermittently with this issue already. So maybe we should leave build 
reproducibility to the jlink tests and expand them as needed.


For ModuleDescriptor::hashCode then I was hoping we keep it simple and 
just test that the hashCode of the descriptors for modules in the boot 
layer matches the hashCode computed from re-parsing the module-info 
files, e.g. something simple like this:


  for (Module module : ModuleLayer.boot().modules()) {
    System.out.println(module);
    ModuleDescriptor descriptor1 = module.getDescriptor();
    ModuleDescriptor descriptor2;
    try (InputStream in = 
module.getResourceAsStream("module-info.class")) {

    descriptor2 = ModuleDescriptor.read(in);
    }
    assertEquals(descriptor1, descriptor2);
    assertEquals(descriptor1.hashCode(), descriptor2.hashCode());
    assertEquals(descriptor1.compareTo(descriptor2), 0);
    }

It run can twice, with with the default, the other with -Xshare:off.

One other thought on the change to ModuleDescriptor is that we could 
drop the modifiers from the computation of the hashCode of 
ModuleDescriptor, Requires, Exports, Opens. Summing the ordinals is okay 
but it doesn't give a good spread if you really have modules that have 
everything the same except for the modifiers then it doesn't really help.


-Alan




Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

2021-11-01 Thread Jaikiran Pai



On 02/11/21 10:29 am, Jaikiran Pai wrote:

Hello Alan,

On 01/11/21 9:22 pm, Alan Bateman wrote:

On Mon, 1 Nov 2021 03:10:45 GMT, Jaikiran Pai  wrote:

Can I please get a review for this change which fixes the issue 
reported in https://bugs.openjdk.java.net/browse/JDK-8275509?


The `ModuleDescriptor.hashCode()` method uses the hash code of its 
various components to compute the final hash code. While doing so 
it ends up calling hashCode() on (collection of) various `enum` 
types. Since the hashCode() of enum types is generated from a call 
to `java.lang.Object.hashCode()`, their value isn't guaranteed to 
stay the same across multiple JVM runs.


The commit here proposes to use the ordinal of the enum as the hash 
code. As Alan notes in the mailing list discussion, any changes to 
the ordinal of the enum (either due to addition of new value, 
removal of a value or just reordering existing values, all of which 
I think will be rare in these specific enum types) isn't expected 
to produce the same hash code across those changed runtimes and 
that should be okay.


The rest of the ModuleDescriptor.hashCode() has been reviewed too 
and apart from calls to the enum types hashCode(), rest of the 
calls in that implementation, either directly or indirectly end up 
as calls on `java.lang.String.hashCode()` and as such aren't 
expected to cause non-deterministic values.


A new jtreg test has been added to reproduce this issue and verify 
the fix.
Jaikiran Pai has updated the pull request incrementally with one 
additional commit since the last revision:


   better method name in test class
Thanks for the update to the test. Instead of launching 50 VMs, what 
would you think about tossing the use of ProcessTools and just 
changing the test description so that the test runs twice, once with 
the default, the second with CDS disabled, as, in:


@run ModuleDescriptorHashCodeTest
@run main/othervm -Xshare:off ModuleDescriptorHashCodeTest


When I started off with this test, I had thought of a similar 
construct. However, in order to compare the hash code value generated 
across JVM runs (i.e. the one generated in @run 1 against the one 
generated in @run 2), I would need to somehow hold on to that dynamic 
value/result for comparison. Using the @run construct wouldn't allow 
me to do that. So I decided to use the ProcessTools library to have 
more control over it. If I missed some way to still use @run for such 
a test, do let me know, I'll change it accordingly.


Perhaps run 1 writing the hash code of each of the boot modules' 
descriptor into a file and then run 2 reading that same file to compare 
the hash codes would be one way to do it. But I think that would just 
make this test more complex, which I think is avoidable.


-Jaikiran



Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

2021-11-01 Thread Jaikiran Pai

Hello Alan,

On 01/11/21 9:22 pm, Alan Bateman wrote:

On Mon, 1 Nov 2021 03:10:45 GMT, Jaikiran Pai  wrote:


Can I please get a review for this change which fixes the issue reported in 
https://bugs.openjdk.java.net/browse/JDK-8275509?

The `ModuleDescriptor.hashCode()` method uses the hash code of its various 
components to compute the final hash code. While doing so it ends up calling 
hashCode() on (collection of) various `enum` types. Since the hashCode() of 
enum types is generated from a call to `java.lang.Object.hashCode()`, their 
value isn't guaranteed to stay the same across multiple JVM runs.

The commit here proposes to use the ordinal of the enum as the hash code. As 
Alan notes in the mailing list discussion, any changes to the ordinal of the 
enum (either due to addition of new value, removal of a value or just 
reordering existing values, all of which I think will be rare in these specific 
enum types) isn't expected to produce the same hash code across those changed 
runtimes and that should be okay.

The rest of the ModuleDescriptor.hashCode() has been reviewed too and apart 
from calls to the enum types hashCode(), rest of the calls in that 
implementation, either directly or indirectly end up as calls on 
`java.lang.String.hashCode()` and as such aren't expected to cause 
non-deterministic values.

A new jtreg test has been added to reproduce this issue and verify the fix.

Jaikiran Pai has updated the pull request incrementally with one additional 
commit since the last revision:

   better method name in test class

Thanks for the update to the test. Instead of launching 50 VMs, what would you 
think about tossing the use of ProcessTools and just changing the test 
description so that the test runs twice, once with the default, the second with 
CDS disabled, as, in:

@run ModuleDescriptorHashCodeTest
@run main/othervm -Xshare:off ModuleDescriptorHashCodeTest


When I started off with this test, I had thought of a similar construct. 
However, in order to compare the hash code value generated across JVM 
runs (i.e. the one generated in @run 1 against the one generated in @run 
2), I would need to somehow hold on to that dynamic value/result for 
comparison. Using the @run construct wouldn't allow me to do that. So I 
decided to use the ProcessTools library to have more control over it. If 
I missed some way to still use @run for such a test, do let me know, 
I'll change it accordingly.



The tests are run by many people every day, they are run continuously on all 
platforms in several CI systems. So I think you'll get much more variability 
that way rather than launching 50 VMs.


I have updated the PR to reduce the runs from 50 to just 2. Once with 
default CDS and once with CDS disabled.



I also feel like the test could be re-written to test the module descriptors of 
all modules in the boot layer. That would avoid having a dependency on the 
java.sql module as it would be testing every module.


I think that's a good point and will make this test case cover more 
modules. I've updated the PR to now cover all these boot layer modules.


-Jaikiran




Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

2021-11-01 Thread Alan Bateman
On Mon, 1 Nov 2021 03:10:45 GMT, Jaikiran Pai  wrote:

>> Can I please get a review for this change which fixes the issue reported in 
>> https://bugs.openjdk.java.net/browse/JDK-8275509?
>> 
>> The `ModuleDescriptor.hashCode()` method uses the hash code of its various 
>> components to compute the final hash code. While doing so it ends up calling 
>> hashCode() on (collection of) various `enum` types. Since the hashCode() of 
>> enum types is generated from a call to `java.lang.Object.hashCode()`, their 
>> value isn't guaranteed to stay the same across multiple JVM runs.
>> 
>> The commit here proposes to use the ordinal of the enum as the hash code. As 
>> Alan notes in the mailing list discussion, any changes to the ordinal of the 
>> enum (either due to addition of new value, removal of a value or just 
>> reordering existing values, all of which I think will be rare in these 
>> specific enum types) isn't expected to produce the same hash code across 
>> those changed runtimes and that should be okay. 
>> 
>> The rest of the ModuleDescriptor.hashCode() has been reviewed too and apart 
>> from calls to the enum types hashCode(), rest of the calls in that 
>> implementation, either directly or indirectly end up as calls on 
>> `java.lang.String.hashCode()` and as such aren't expected to cause 
>> non-deterministic values.
>> 
>> A new jtreg test has been added to reproduce this issue and verify the fix.
>
> Jaikiran Pai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   better method name in test class

Thanks for the update to the test. Instead of launching 50 VMs, what would you 
think about tossing the use of ProcessTools and just changing the test 
description so that the test runs twice, once with the default, the second with 
CDS disabled, as, in:

@run ModuleDescriptorHashCodeTest
@run main/othervm -Xshare:off ModuleDescriptorHashCodeTest

The tests are run by many people every day, they are run continuously on all 
platforms in several CI systems. So I think you'll get much more variability 
that way rather than launching 50 VMs.

I also feel like the test could be re-written to test the module descriptors of 
all modules in the boot layer. That would avoid having a dependency on the 
java.sql module as it would be testing every module.

-

PR: https://git.openjdk.java.net/jdk/pull/6078


Re: RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

2021-10-31 Thread Jaikiran Pai
> Can I please get a review for this change which fixes the issue reported in 
> https://bugs.openjdk.java.net/browse/JDK-8275509?
> 
> The `ModuleDescriptor.hashCode()` method uses the hash code of its various 
> components to compute the final hash code. While doing so it ends up calling 
> hashCode() on (collection of) various `enum` types. Since the hashCode() of 
> enum types is generated from a call to `java.lang.Object.hashCode()`, their 
> value isn't guaranteed to stay the same across multiple JVM runs.
> 
> The commit here proposes to use the ordinal of the enum as the hash code. As 
> Alan notes in the mailing list discussion, any changes to the ordinal of the 
> enum (either due to addition of new value, removal of a value or just 
> reordering existing values, all of which I think will be rare in these 
> specific enum types) isn't expected to produce the same hash code across 
> those changed runtimes and that should be okay. 
> 
> The rest of the ModuleDescriptor.hashCode() has been reviewed too and apart 
> from calls to the enum types hashCode(), rest of the calls in that 
> implementation, either directly or indirectly end up as calls on 
> `java.lang.String.hashCode()` and as such aren't expected to cause 
> non-deterministic values.
> 
> A new jtreg test has been added to reproduce this issue and verify the fix.

Jaikiran Pai has updated the pull request incrementally with one additional 
commit since the last revision:

  better method name in test class

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6078/files
  - new: https://git.openjdk.java.net/jdk/pull/6078/files/e8f19d82..dee4197f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6078&range=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6078&range=07-08

  Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6078.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6078/head:pull/6078

PR: https://git.openjdk.java.net/jdk/pull/6078