subject:"Re\: JDK\-8210280 \- Unnecessary reallocation when invoking HashMap.putAll\(\)"

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-02-10 Thread Michal Vala


Hi Martin,

thank you for reviewing and sponsoring this!

On 2/8/19 10:48 PM, Martin Buchholz wrote:

and ... finally committed
(sorry as always for the sloth pace)

On Wed, Feb 6, 2019 at 10:28 AM Martin Buchholz  wrote:


Finally got around to doing some work on this.
I'm happy with this latest revision:


https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/HashMap-resize/index.html

I did some more work on the microbenchmark, in part to force myself to
learn about jmh.
It now tests LinkedHashMap as well.
I replaced use of Random with ThreadLocalRandom

 @Param("100")
 private int size;

 @Param
 private MapType mapType;

 public enum MapType {
 HASH_MAP,
 LINKED_HASH_MAP,
 }



On Thu, Jan 3, 2019 at 12:31 PM Michal Vala  wrote:


Hi Martin,

can we please finish this review?

On 12/19/18 6:32 PM, Michal Vala wrote:



On 12/19/18 4:15 PM, Martin Buchholz wrote:

On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs 

wrote:



Hi Martin,

It is also useful and conventional to print the seed of the random
so that if necessary it can be reproduced.



For many years, we've been using ThreadLocalRandom for testing, and

that

does not allow setting a seed.

I remain unconvinced that saving a seed has value in the real world.

When

a randomized test fails, running it with sufficient iterations has

always

worked for me.



What's the reason behind using ThreadLocalRandom?

In my opinion, reproducing the issue is key. One failure of randomized

test run

might be caused by one issue, second run due to another issue. How we

reproduce

then and how we know even how many issues we have? When we're running

random

tests until it pass, it might even hide the issue.

They sure have good value on reveal the issue, but then we have to know

how to

reproduce and what we're searching for.

If this case would not require ThreadLocalRandom and Random is enough,

I'd like

to use that because of benefits I've mentioned.



--
Michal Vala
OpenJDK QE
Red Hat Czech







--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-02-08 Thread Martin Buchholz

and ... finally committed
(sorry as always for the sloth pace)

On Wed, Feb 6, 2019 at 10:28 AM Martin Buchholz  wrote:

> Finally got around to doing some work on this.
> I'm happy with this latest revision:
>
>
> https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/HashMap-resize/index.html
>
> I did some more work on the microbenchmark, in part to force myself to
> learn about jmh.
> It now tests LinkedHashMap as well.
> I replaced use of Random with ThreadLocalRandom
>
> @Param("100")
> private int size;
>
> @Param
> private MapType mapType;
>
> public enum MapType {
> HASH_MAP,
> LINKED_HASH_MAP,
> }
>
>
>
> On Thu, Jan 3, 2019 at 12:31 PM Michal Vala  wrote:
>
>> Hi Martin,
>>
>> can we please finish this review?
>>
>> On 12/19/18 6:32 PM, Michal Vala wrote:
>> >
>> >
>> > On 12/19/18 4:15 PM, Martin Buchholz wrote:
>> >> On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs 
>> wrote:
>> >>
>> >>> Hi Martin,
>> >>>
>> >>> It is also useful and conventional to print the seed of the random
>> >>> so that if necessary it can be reproduced.
>> >>>
>> >>
>> >> For many years, we've been using ThreadLocalRandom for testing, and
>> that
>> >> does not allow setting a seed.
>> >>
>> >> I remain unconvinced that saving a seed has value in the real world.
>> When
>> >> a randomized test fails, running it with sufficient iterations has
>> always
>> >> worked for me.
>> >>
>> >
>> > What's the reason behind using ThreadLocalRandom?
>> >
>> > In my opinion, reproducing the issue is key. One failure of randomized
>> test run
>> > might be caused by one issue, second run due to another issue. How we
>> reproduce
>> > then and how we know even how many issues we have? When we're running
>> random
>> > tests until it pass, it might even hide the issue.
>> >
>> > They sure have good value on reveal the issue, but then we have to know
>> how to
>> > reproduce and what we're searching for.
>> >
>> > If this case would not require ThreadLocalRandom and Random is enough,
>> I'd like
>> > to use that because of benefits I've mentioned.
>> >
>>
>> --
>> Michal Vala
>> OpenJDK QE
>> Red Hat Czech
>>
>

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-02-06 Thread Martin Buchholz

Finally got around to doing some work on this.
I'm happy with this latest revision:

https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/HashMap-resize/index.html

I did some more work on the microbenchmark, in part to force myself to
learn about jmh.
It now tests LinkedHashMap as well.
I replaced use of Random with ThreadLocalRandom

@Param("100")
private int size;

@Param
private MapType mapType;

public enum MapType {
HASH_MAP,
LINKED_HASH_MAP,
}



On Thu, Jan 3, 2019 at 12:31 PM Michal Vala  wrote:

> Hi Martin,
>
> can we please finish this review?
>
> On 12/19/18 6:32 PM, Michal Vala wrote:
> >
> >
> > On 12/19/18 4:15 PM, Martin Buchholz wrote:
> >> On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs 
> wrote:
> >>
> >>> Hi Martin,
> >>>
> >>> It is also useful and conventional to print the seed of the random
> >>> so that if necessary it can be reproduced.
> >>>
> >>
> >> For many years, we've been using ThreadLocalRandom for testing, and that
> >> does not allow setting a seed.
> >>
> >> I remain unconvinced that saving a seed has value in the real world.
> When
> >> a randomized test fails, running it with sufficient iterations has
> always
> >> worked for me.
> >>
> >
> > What's the reason behind using ThreadLocalRandom?
> >
> > In my opinion, reproducing the issue is key. One failure of randomized
> test run
> > might be caused by one issue, second run due to another issue. How we
> reproduce
> > then and how we know even how many issues we have? When we're running
> random
> > tests until it pass, it might even hide the issue.
> >
> > They sure have good value on reveal the issue, but then we have to know
> how to
> > reproduce and what we're searching for.
> >
> > If this case would not require ThreadLocalRandom and Random is enough,
> I'd like
> > to use that because of benefits I've mentioned.
> >
>
> --
> Michal Vala
> OpenJDK QE
> Red Hat Czech
>

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-01-30 Thread Martin Buchholz

This one keeps bubbling to the very top of my todo stack, only to be
pre-empted a minute later by some other task.

On Tue, Jan 29, 2019 at 2:00 AM Michal Vala  wrote:

> Hi Martin,
>
> I'd like to finish this review. Are you still willing to sponsor this?
>
> Thanks!
>
> On 1/9/19 11:39 AM, Michal Vala wrote:
> > ping
> >
> > On 1/3/19 9:31 PM, Michal Vala wrote:
> >> Hi Martin,
> >>
> >> can we please finish this review?
> >>
> >> On 12/19/18 6:32 PM, Michal Vala wrote:
> >>>
> >>>
> >>> On 12/19/18 4:15 PM, Martin Buchholz wrote:
>  On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs 
> wrote:
> 
> > Hi Martin,
> >
> > It is also useful and conventional to print the seed of the random
> > so that if necessary it can be reproduced.
> >
> 
>  For many years, we've been using ThreadLocalRandom for testing, and
> that
>  does not allow setting a seed.
> 
>  I remain unconvinced that saving a seed has value in the real world.
> When
>  a randomized test fails, running it with sufficient iterations has
> always
>  worked for me.
> 
> >>>
> >>> What's the reason behind using ThreadLocalRandom?
> >>>
> >>> In my opinion, reproducing the issue is key. One failure of randomized
> test
> >>> run might be caused by one issue, second run due to another issue. How
> we
> >>> reproduce then and how we know even how many issues we have? When
> we're
> >>> running random tests until it pass, it might even hide the issue.
> >>>
> >>> They sure have good value on reveal the issue, but then we have to
> know how
> >>> to reproduce and what we're searching for.
> >>>
> >>> If this case would not require ThreadLocalRandom and Random is enough,
> I'd
> >>> like to use that because of benefits I've mentioned.
> >>>
> >>
> >
>
> --
> Michal Vala
> OpenJDK QE
> Red Hat Czech
>

RE: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-01-29 Thread Patrick Zhang

This is a nice patch as I found the same problem back to December. Privately I 
was using "(size + s > threshold)" condition for my cases, and found Michal's 
early comment that this would lead to "unwanted space allocation" caused by 
duplicate keys, thanks.

Looking forward to having this in jdk/jdk trunk.

Regards
Patrick

-Original Message-
From: core-libs-dev  On Behalf Of 
Michal Vala
Sent: Tuesday, January 29, 2019 6:00 PM
To: Martin Buchholz 
Cc: core-libs-dev 
Subject: Re: JDK-8210280 - Unnecessary reallocation when invoking 
HashMap.putAll()

Hi Martin,

I'd like to finish this review. Are you still willing to sponsor this?

Thanks!

On 1/9/19 11:39 AM, Michal Vala wrote:
> ping
> 
> On 1/3/19 9:31 PM, Michal Vala wrote:
>> Hi Martin,
>>
>> can we please finish this review?
>>
>> On 12/19/18 6:32 PM, Michal Vala wrote:
>>>
>>>
>>> On 12/19/18 4:15 PM, Martin Buchholz wrote:
>>>> On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs  wrote:
>>>>
>>>>> Hi Martin,
>>>>>
>>>>> It is also useful and conventional to print the seed of the random 
>>>>> so that if necessary it can be reproduced.
>>>>>
>>>>
>>>> For many years, we've been using ThreadLocalRandom for testing, and 
>>>> that does not allow setting a seed.
>>>>
>>>> I remain unconvinced that saving a seed has value in the real 
>>>> world.  When a randomized test fails, running it with sufficient 
>>>> iterations has always worked for me.
>>>>
>>>
>>> What's the reason behind using ThreadLocalRandom?
>>>
>>> In my opinion, reproducing the issue is key. One failure of 
>>> randomized test run might be caused by one issue, second run due to 
>>> another issue. How we reproduce then and how we know even how many 
>>> issues we have? When we're running random tests until it pass, it might 
>>> even hide the issue.
>>>
>>> They sure have good value on reveal the issue, but then we have to 
>>> know how to reproduce and what we're searching for.
>>>
>>> If this case would not require ThreadLocalRandom and Random is 
>>> enough, I'd like to use that because of benefits I've mentioned.
>>>
>>
> 

--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-01-29 Thread Michal Vala


Hi Martin,

I'd like to finish this review. Are you still willing to sponsor this?

Thanks!

On 1/9/19 11:39 AM, Michal Vala wrote:

ping

On 1/3/19 9:31 PM, Michal Vala wrote:

Hi Martin,

can we please finish this review?

On 12/19/18 6:32 PM, Michal Vala wrote:



On 12/19/18 4:15 PM, Martin Buchholz wrote:

On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs  wrote:


Hi Martin,

It is also useful and conventional to print the seed of the random
so that if necessary it can be reproduced.



For many years, we've been using ThreadLocalRandom for testing, and that
does not allow setting a seed.

I remain unconvinced that saving a seed has value in the real world.  When
a randomized test fails, running it with sufficient iterations has always
worked for me.



What's the reason behind using ThreadLocalRandom?

In my opinion, reproducing the issue is key. One failure of randomized test 
run might be caused by one issue, second run due to another issue. How we 
reproduce then and how we know even how many issues we have? When we're 
running random tests until it pass, it might even hide the issue.


They sure have good value on reveal the issue, but then we have to know how 
to reproduce and what we're searching for.


If this case would not require ThreadLocalRandom and Random is enough, I'd 
like to use that because of benefits I've mentioned.








--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-01-09 Thread Michal Vala


ping

On 1/3/19 9:31 PM, Michal Vala wrote:

Hi Martin,

can we please finish this review?

On 12/19/18 6:32 PM, Michal Vala wrote:



On 12/19/18 4:15 PM, Martin Buchholz wrote:

On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs  wrote:


Hi Martin,

It is also useful and conventional to print the seed of the random
so that if necessary it can be reproduced.



For many years, we've been using ThreadLocalRandom for testing, and that
does not allow setting a seed.

I remain unconvinced that saving a seed has value in the real world.  When
a randomized test fails, running it with sufficient iterations has always
worked for me.



What's the reason behind using ThreadLocalRandom?

In my opinion, reproducing the issue is key. One failure of randomized test 
run might be caused by one issue, second run due to another issue. How we 
reproduce then and how we know even how many issues we have? When we're 
running random tests until it pass, it might even hide the issue.


They sure have good value on reveal the issue, but then we have to know how to 
reproduce and what we're searching for.


If this case would not require ThreadLocalRandom and Random is enough, I'd 
like to use that because of benefits I've mentioned.






--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2019-01-03 Thread Michal Vala


Hi Martin,

can we please finish this review?

On 12/19/18 6:32 PM, Michal Vala wrote:



On 12/19/18 4:15 PM, Martin Buchholz wrote:

On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs  wrote:


Hi Martin,

It is also useful and conventional to print the seed of the random
so that if necessary it can be reproduced.



For many years, we've been using ThreadLocalRandom for testing, and that
does not allow setting a seed.

I remain unconvinced that saving a seed has value in the real world.  When
a randomized test fails, running it with sufficient iterations has always
worked for me.



What's the reason behind using ThreadLocalRandom?

In my opinion, reproducing the issue is key. One failure of randomized test run 
might be caused by one issue, second run due to another issue. How we reproduce 
then and how we know even how many issues we have? When we're running random 
tests until it pass, it might even hide the issue.


They sure have good value on reveal the issue, but then we have to know how to 
reproduce and what we're searching for.


If this case would not require ThreadLocalRandom and Random is enough, I'd like 
to use that because of benefits I've mentioned.




--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-20 Thread Joe Darcy


Hello,

For some other libs test using randomness, we've been able to track down 
and fix bugs only after a seed value was printed and usable for 
reproducing the problem, see the history of:


* JDK-6854417: "testbug: java/util/regex/RegExTest.java fails 
intermittently."
* JDK-804: Rare bug in JISAutodetect charset detected by 
FindDecoderBugs test


The bad values which trigger the bug might only a very small subset of 
the full space. For the latter bug which was tracked down once the seed 
made the situation reproducible, see this comment from Martin in 2015 :-)


Sherman: Thanks for fixing this.  I've observed this in the wild a 
couple of times myself.


(Although flaky tests are annoying, finding actual bugs makes it so 
worth it!)


http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034836.html

Please using the testing randomness library and the randomness jtreg 
keyword. The keyword helps guide failure analysis.


Thanks,

-Joe


On 12/19/2018 7:15 AM, Martin Buchholz wrote:

On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs  wrote:


Hi Martin,

It is also useful and conventional to print the seed of the random
so that if necessary it can be reproduced.


For many years, we've been using ThreadLocalRandom for testing, and that
does not allow setting a seed.

I remain unconvinced that saving a seed has value in the real world.  When
a randomized test fails, running it with sufficient iterations has always
worked for me.

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Michal Vala





On 12/19/18 4:15 PM, Martin Buchholz wrote:

On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs  wrote:


Hi Martin,

It is also useful and conventional to print the seed of the random
so that if necessary it can be reproduced.



For many years, we've been using ThreadLocalRandom for testing, and that
does not allow setting a seed.

I remain unconvinced that saving a seed has value in the real world.  When
a randomized test fails, running it with sufficient iterations has always
worked for me.



What's the reason behind using ThreadLocalRandom?

In my opinion, reproducing the issue is key. One failure of randomized test run 
might be caused by one issue, second run due to another issue. How we reproduce 
then and how we know even how many issues we have? When we're running random 
tests until it pass, it might even hide the issue.


They sure have good value on reveal the issue, but then we have to know how to 
reproduce and what we're searching for.


If this case would not require ThreadLocalRandom and Random is enough, I'd like 
to use that because of benefits I've mentioned.


--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Michal Vala


I was thinking about testing resizing with items in buckets.
Test like this. Can you please include it to changeset? (I can see you have 
prepared full webrev from changeset, so I'm not posting new one.)


@Test
public void resizeInternalTableWithBucketsTest() {
Map m = new HashMap();
// Default size of HashMap table is 16. Make sure there are multiple 
items in one bucket.

for (int i = 0; i < 4; i++) {
m.put((i * 16) + 7, 1);
}

Map m2 = new HashMap();
for (int i = 0; i < 64; i++) {
m2.put(-i, 2);
}

// put bigger map to ensure multiple resizing
m.putAll(m2);

// all original items should be there and properly rehashed
for (int i = 0; i < 4; i++) {
assertEquals(m.get((i * 16) + 7), 1);
}
}

On 12/19/18 3:47 PM, Martin Buchholz wrote:

Let's add whitebox tests for initial capacity and LinkedHashMap, as with
ConcurrentHashMap's whitebox tests.

--- src/test/jtreg/util/HashMap/WhiteBoxResizeTest.java 18 Dec 2018
20:21:24 - 1.1
+++ src/test/jtreg/util/HashMap/WhiteBoxResizeTest.java 19 Dec 2018
14:35:50 -
@@ -27,7 +27,11 @@
  import java.lang.invoke.MethodType;
  import java.lang.invoke.VarHandle;
  import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
  import java.util.Map;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.function.Supplier;
  import java.util.stream.IntStream;

  import static java.util.stream.Collectors.toMap;
@@ -42,6 +46,7 @@
   * @run testng WhiteBoxResizeTest
   */
  public class WhiteBoxResizeTest {
+final ThreadLocalRandom rnd = ThreadLocalRandom.current();
  final MethodHandle TABLE_SIZE_FOR;
  final VarHandle THRESHOLD;
  final VarHandle TABLE;
@@ -91,14 +96,36 @@
  }

  @Test
-public void capacityTest() {
-HashMap map = new HashMap<>();
+public void capacityTestDefaultConstructor() {
+capacityTestDefaultConstructor(new HashMap<>());
+capacityTestDefaultConstructor(new LinkedHashMap<>());
+}
+
+void capacityTestDefaultConstructor(HashMap map) {
  assertNull(table(map));

  map.put(1, 1);
-assertEquals(capacity(map), 16);
+assertEquals(capacity(map), 16); // default initial capacity

  map.putAll(IntStream.range(0, 64).boxed().collect(toMap(i -> i, i
-> i)));
  assertEquals(capacity(map), 128);
  }
+
+@Test
+public void capacityTestInitialCapacity() {
+int initialCapacity = rnd.nextInt(1, 256);
+List>> suppliers = List.of(
+() -> new HashMap<>(initialCapacity),
+() -> new HashMap<>(initialCapacity, 0.75f),
+() -> new LinkedHashMap<>(initialCapacity),
+() -> new LinkedHashMap<>(initialCapacity, 0.75f));
+
+for (Supplier> supplier : suppliers) {
+HashMap map = supplier.get();
+assertNull(table(map));
+
+map.put(1, 1);
+assertEquals(capacity(map), tableSizeFor(initialCapacity));
+}
+}
  }



--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Martin Buchholz

On Wed, Dec 19, 2018 at 7:01 AM Chris Hegarty 
wrote:

>
> Also, please add the jtreg key so that the test is identifiable as using
> randomness:  `@key randomness` .
>

Done, but I remain unconvinced of the value of doing so.

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Martin Buchholz

On Wed, Dec 19, 2018 at 6:59 AM Roger Riggs  wrote:

> Hi Martin,
>
> It is also useful and conventional to print the seed of the random
> so that if necessary it can be reproduced.
>

For many years, we've been using ThreadLocalRandom for testing, and that
does not allow setting a seed.

I remain unconvinced that saving a seed has value in the real world.  When
a randomized test fails, running it with sufficient iterations has always
worked for me.

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Chris Hegarty



> On 19 Dec 2018, at 15:01, Chris Hegarty  wrote:
> 
> Martin,
> 
> Is it possible to use the the JDK’s test library jdk.test.lib.RandomFactory,
> so that the initial seed is captured and rep*l*ayable.

-Chris.

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Chris Hegarty

Martin,

Is it possible to use the the JDK’s test library jdk.test.lib.RandomFactory,
so that the initial seed is captured and repayable.

Also, please add the jtreg key so that the test is identifiable as using
randomness:  `@key randomness` .

-Chris.

> On 19 Dec 2018, at 14:47, Martin Buchholz  wrote:
> 
> Let's add whitebox tests for initial capacity and LinkedHashMap, as with
> ConcurrentHashMap's whitebox tests.
> 
> --- src/test/jtreg/util/HashMap/WhiteBoxResizeTest.java 18 Dec 2018
> 20:21:24 - 1.1
> +++ src/test/jtreg/util/HashMap/WhiteBoxResizeTest.java 19 Dec 2018
> 14:35:50 -
> @@ -27,7 +27,11 @@
> import java.lang.invoke.MethodType;
> import java.lang.invoke.VarHandle;
> import java.util.HashMap;
> +import java.util.LinkedHashMap;
> +import java.util.List;
> import java.util.Map;
> +import java.util.concurrent.ThreadLocalRandom;
> +import java.util.function.Supplier;
> import java.util.stream.IntStream;
> 
> import static java.util.stream.Collectors.toMap;
> @@ -42,6 +46,7 @@
>  * @run testng WhiteBoxResizeTest
>  */
> public class WhiteBoxResizeTest {
> +final ThreadLocalRandom rnd = ThreadLocalRandom.current();
> final MethodHandle TABLE_SIZE_FOR;
> final VarHandle THRESHOLD;
> final VarHandle TABLE;
> @@ -91,14 +96,36 @@
> }
> 
> @Test
> -public void capacityTest() {
> -HashMap map = new HashMap<>();
> +public void capacityTestDefaultConstructor() {
> +capacityTestDefaultConstructor(new HashMap<>());
> +capacityTestDefaultConstructor(new LinkedHashMap<>());
> +}
> +
> +void capacityTestDefaultConstructor(HashMap map) {
> assertNull(table(map));
> 
> map.put(1, 1);
> -assertEquals(capacity(map), 16);
> +assertEquals(capacity(map), 16); // default initial capacity
> 
> map.putAll(IntStream.range(0, 64).boxed().collect(toMap(i -> i, i
> -> i)));
> assertEquals(capacity(map), 128);
> }
> +
> +@Test
> +public void capacityTestInitialCapacity() {
> +int initialCapacity = rnd.nextInt(1, 256);
> +List>> suppliers = List.of(
> +() -> new HashMap<>(initialCapacity),
> +() -> new HashMap<>(initialCapacity, 0.75f),
> +() -> new LinkedHashMap<>(initialCapacity),
> +() -> new LinkedHashMap<>(initialCapacity, 0.75f));
> +
> +for (Supplier> supplier : suppliers) {
> +HashMap map = supplier.get();
> +assertNull(table(map));
> +
> +map.put(1, 1);
> +assertEquals(capacity(map), tableSizeFor(initialCapacity));
> +}
> +}
> }

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Roger Riggs


Hi Martin,

It is also useful and conventional to print the seed of the random
so that if necessary it can be reproduced.

Roger


On 12/19/2018 09:01 AM, Martin Buchholz wrote:

On Tue, Dec 18, 2018 at 10:48 PM Michal Vala  wrote:


Randomized test is not deterministic now. Can we have also original one?


We may have an irreconcilable difference of testing philosophy.
  (yes, mainstream testing thought has embraced determinism)

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Martin Buchholz

Let's add whitebox tests for initial capacity and LinkedHashMap, as with
ConcurrentHashMap's whitebox tests.

--- src/test/jtreg/util/HashMap/WhiteBoxResizeTest.java 18 Dec 2018
20:21:24 - 1.1
+++ src/test/jtreg/util/HashMap/WhiteBoxResizeTest.java 19 Dec 2018
14:35:50 -
@@ -27,7 +27,11 @@
 import java.lang.invoke.MethodType;
 import java.lang.invoke.VarHandle;
 import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
 import java.util.Map;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.function.Supplier;
 import java.util.stream.IntStream;

 import static java.util.stream.Collectors.toMap;
@@ -42,6 +46,7 @@
  * @run testng WhiteBoxResizeTest
  */
 public class WhiteBoxResizeTest {
+final ThreadLocalRandom rnd = ThreadLocalRandom.current();
 final MethodHandle TABLE_SIZE_FOR;
 final VarHandle THRESHOLD;
 final VarHandle TABLE;
@@ -91,14 +96,36 @@
 }

 @Test
-public void capacityTest() {
-HashMap map = new HashMap<>();
+public void capacityTestDefaultConstructor() {
+capacityTestDefaultConstructor(new HashMap<>());
+capacityTestDefaultConstructor(new LinkedHashMap<>());
+}
+
+void capacityTestDefaultConstructor(HashMap map) {
 assertNull(table(map));

 map.put(1, 1);
-assertEquals(capacity(map), 16);
+assertEquals(capacity(map), 16); // default initial capacity

 map.putAll(IntStream.range(0, 64).boxed().collect(toMap(i -> i, i
-> i)));
 assertEquals(capacity(map), 128);
 }
+
+@Test
+public void capacityTestInitialCapacity() {
+int initialCapacity = rnd.nextInt(1, 256);
+List>> suppliers = List.of(
+() -> new HashMap<>(initialCapacity),
+() -> new HashMap<>(initialCapacity, 0.75f),
+() -> new LinkedHashMap<>(initialCapacity),
+() -> new LinkedHashMap<>(initialCapacity, 0.75f));
+
+for (Supplier> supplier : suppliers) {
+HashMap map = supplier.get();
+assertNull(table(map));
+
+map.put(1, 1);
+assertEquals(capacity(map), tableSizeFor(initialCapacity));
+}
+}
 }

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Michal Vala





On 12/19/18 3:01 PM, Martin Buchholz wrote:

On Tue, Dec 18, 2018 at 10:48 PM Michal Vala  wrote:


Randomized test is not deterministic now. Can we have also original one?



We may have an irreconcilable difference of testing philosophy.
  (yes, mainstream testing thought has embraced determinism)



Sure, that's probably true. All I'm asking is to have one more deterministic 
test which run few milliseconds. It's much easier to debug what is happening 
than randomized one. When someone will go with similar approach as I did first, 
this test stops him for sure and he/she will have faster feedback what's wrong.


--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-19 Thread Martin Buchholz

On Tue, Dec 18, 2018 at 10:48 PM Michal Vala  wrote:

> Randomized test is not deterministic now. Can we have also original one?
>

We may have an irreconcilable difference of testing philosophy.
 (yes, mainstream testing thought has embraced determinism)

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-18 Thread Michal Vala


Randomized test is not deterministic now. Can we have also original one?

On 12/18/18 10:29 PM, Martin Buchholz wrote:

We have changes in jsr166 CVS  ready for going into openjdk
https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/HashMap-resize/index.html
(except for the microbenchmark, which doesn't fit into our pipeline at the
moment).
Pardon the fiddling ...
---
Let's check in this order:

+while (s > threshold && table.length < MAXIMUM_CAPACITY)

---
Let's add more HashMap pseudo-methods to the test:

+
+Object[] table(HashMap map) {
+try {
+return (Object[]) TABLE.get(map);
+} catch (Throwable t) { throw new AssertionError(t); }
+}
+
+int capacity(HashMap map) {
+return table(map).length;
+}


and some more assertions:

+@Test
+public void capacityTest() {
+HashMap map = new HashMap<>();
+assertNull(table(map));
+
+map.put(1, 1);
+assertEquals(capacity(map), 16);
+
+map.putAll(IntStream.range(0, 64).boxed().collect(toMap(i ->
i, i -> i)));
+assertEquals(capacity(map), 128);
+}


I did my own TODO to randomize testBug8210280

On Mon, Dec 17, 2018 at 7:40 AM Michal Vala  wrote:


All tests I've run passed, benchmarks show ~15% performance boost for
putAllWithBigMapToNonEmptyMap.

On 12/17/18 7:32 AM, Michal Vala wrote:

Hi,

thanks Doug, this is nice reduction.

Here's the new webrev:
http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.03/

Just a nitpick, issue is in using linked-list in buckets. The same is

used for

both HashMap and LinkedHashMap, so mentioning just LinkedHashMap might

be

confising. I've updated the comment s/LinkedHashMap/linked-list buckets/.

I'm just running tier1 tests and benchmarks.

On 12/16/18 3:23 PM, Doug Lea wrote:


On 12/14/18 1:37 AM, Michal Vala wrote:

Thanks Martin for finding this serious issue and the testcase.



Sorry that I wasn't paying attention to this and so forced Martin to
discover the hard way that because of LinkeHashMap, you can't skip
doubling steps (at least not without a lot of rework). Also, the
documentation should have mentioned this. A simpler way to reduce
overhead in the case at hand is just to loop in putMapEntries:

--- HashMap.java.~1.9.~2018-11-11 15:43:24.982878495 -0500
+++ HashMap.java2018-12-16 09:05:48.924727867 -0500
@@ -502,8 +502,13 @@
   if (t > threshold)
   threshold = tableSizeFor(t);
   }
-else if (s > threshold)
-resize();
+else {
+// Because of LinkedHashMap constraints, we cannot
+// expand all at once, but can reduce total resize
+// effort by repeated doubling now vs later
+while (table.length <  MAXIMUM_CAPACITY && s >

threshold)

+resize();
+}
   for (Map.Entry e :

m.entrySet()) {

   K key = e.getKey();
   V value = e.getValue();





--
Michal Vala
OpenJDK QE
Red Hat Czech





--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-18 Thread Martin Buchholz

We have changes in jsr166 CVS  ready for going into openjdk
https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/HashMap-resize/index.html
(except for the microbenchmark, which doesn't fit into our pipeline at the
moment).
Pardon the fiddling ...
---
Let's check in this order:

+while (s > threshold && table.length < MAXIMUM_CAPACITY)

---
Let's add more HashMap pseudo-methods to the test:

+
+Object[] table(HashMap map) {
+try {
+return (Object[]) TABLE.get(map);
+} catch (Throwable t) { throw new AssertionError(t); }
+}
+
+int capacity(HashMap map) {
+return table(map).length;
+}


and some more assertions:

+@Test
+public void capacityTest() {
+HashMap map = new HashMap<>();
+assertNull(table(map));
+
+map.put(1, 1);
+assertEquals(capacity(map), 16);
+
+map.putAll(IntStream.range(0, 64).boxed().collect(toMap(i ->
i, i -> i)));
+assertEquals(capacity(map), 128);
+}


I did my own TODO to randomize testBug8210280

On Mon, Dec 17, 2018 at 7:40 AM Michal Vala  wrote:

> All tests I've run passed, benchmarks show ~15% performance boost for
> putAllWithBigMapToNonEmptyMap.
>
> On 12/17/18 7:32 AM, Michal Vala wrote:
> > Hi,
> >
> > thanks Doug, this is nice reduction.
> >
> > Here's the new webrev:
> > http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.03/
> >
> > Just a nitpick, issue is in using linked-list in buckets. The same is
> used for
> > both HashMap and LinkedHashMap, so mentioning just LinkedHashMap might
> be
> > confising. I've updated the comment s/LinkedHashMap/linked-list buckets/.
> >
> > I'm just running tier1 tests and benchmarks.
> >
> > On 12/16/18 3:23 PM, Doug Lea wrote:
> >>
> >> On 12/14/18 1:37 AM, Michal Vala wrote:
> >>> Thanks Martin for finding this serious issue and the testcase.
> >>>
> >>
> >> Sorry that I wasn't paying attention to this and so forced Martin to
> >> discover the hard way that because of LinkeHashMap, you can't skip
> >> doubling steps (at least not without a lot of rework). Also, the
> >> documentation should have mentioned this. A simpler way to reduce
> >> overhead in the case at hand is just to loop in putMapEntries:
> >>
> >> --- HashMap.java.~1.9.~2018-11-11 15:43:24.982878495 -0500
> >> +++ HashMap.java2018-12-16 09:05:48.924727867 -0500
> >> @@ -502,8 +502,13 @@
> >>   if (t > threshold)
> >>   threshold = tableSizeFor(t);
> >>   }
> >> -else if (s > threshold)
> >> -resize();
> >> +else {
> >> +// Because of LinkedHashMap constraints, we cannot
> >> +// expand all at once, but can reduce total resize
> >> +// effort by repeated doubling now vs later
> >> +while (table.length <  MAXIMUM_CAPACITY && s >
> threshold)
> >> +resize();
> >> +}
> >>   for (Map.Entry e :
> m.entrySet()) {
> >>   K key = e.getKey();
> >>   V value = e.getValue();
> >>
> >
>
> --
> Michal Vala
> OpenJDK QE
> Red Hat Czech
>

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-17 Thread Michal Vala

All tests I've run passed, benchmarks show ~15% performance boost for 
putAllWithBigMapToNonEmptyMap.


On 12/17/18 7:32 AM, Michal Vala wrote:

Hi,

thanks Doug, this is nice reduction.

Here's the new webrev: 
http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.03/


Just a nitpick, issue is in using linked-list in buckets. The same is used for 
both HashMap and LinkedHashMap, so mentioning just LinkedHashMap might be 
confising. I've updated the comment s/LinkedHashMap/linked-list buckets/.


I'm just running tier1 tests and benchmarks.

On 12/16/18 3:23 PM, Doug Lea wrote:


On 12/14/18 1:37 AM, Michal Vala wrote:

Thanks Martin for finding this serious issue and the testcase.



Sorry that I wasn't paying attention to this and so forced Martin to
discover the hard way that because of LinkeHashMap, you can't skip
doubling steps (at least not without a lot of rework). Also, the
documentation should have mentioned this. A simpler way to reduce
overhead in the case at hand is just to loop in putMapEntries:

--- HashMap.java.~1.9.~    2018-11-11 15:43:24.982878495 -0500
+++ HashMap.java    2018-12-16 09:05:48.924727867 -0500
@@ -502,8 +502,13 @@
  if (t > threshold)
  threshold = tableSizeFor(t);
  }
-    else if (s > threshold)
-    resize();
+    else {
+    // Because of LinkedHashMap constraints, we cannot
+    // expand all at once, but can reduce total resize
+    // effort by repeated doubling now vs later
+    while (table.length <  MAXIMUM_CAPACITY && s > threshold)
+    resize();
+    }
  for (Map.Entry e : m.entrySet()) {
  K key = e.getKey();
  V value = e.getValue();





--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-16 Thread Michal Vala


Hi,

thanks Doug, this is nice reduction.

Here's the new webrev: 
http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.03/


Just a nitpick, issue is in using linked-list in buckets. The same is used for 
both HashMap and LinkedHashMap, so mentioning just LinkedHashMap might be 
confising. I've updated the comment s/LinkedHashMap/linked-list buckets/.


I'm just running tier1 tests and benchmarks.

On 12/16/18 3:23 PM, Doug Lea wrote:


On 12/14/18 1:37 AM, Michal Vala wrote:

Thanks Martin for finding this serious issue and the testcase.



Sorry that I wasn't paying attention to this and so forced Martin to
discover the hard way that because of LinkeHashMap, you can't skip
doubling steps (at least not without a lot of rework). Also, the
documentation should have mentioned this. A simpler way to reduce
overhead in the case at hand is just to loop in putMapEntries:

--- HashMap.java.~1.9.~ 2018-11-11 15:43:24.982878495 -0500
+++ HashMap.java2018-12-16 09:05:48.924727867 -0500
@@ -502,8 +502,13 @@
  if (t > threshold)
  threshold = tableSizeFor(t);
  }
-else if (s > threshold)
-resize();
+else {
+// Because of LinkedHashMap constraints, we cannot
+// expand all at once, but can reduce total resize
+// effort by repeated doubling now vs later
+while (table.length <  MAXIMUM_CAPACITY && s > threshold)
+resize();
+}
  for (Map.Entry e : m.entrySet()) {
  K key = e.getKey();
  V value = e.getValue();



--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-16 Thread Doug Lea

On 12/14/18 1:37 AM, Michal Vala wrote:
> Thanks Martin for finding this serious issue and the testcase.
> 

Sorry that I wasn't paying attention to this and so forced Martin to
discover the hard way that because of LinkeHashMap, you can't skip
doubling steps (at least not without a lot of rework). Also, the
documentation should have mentioned this. A simpler way to reduce
overhead in the case at hand is just to loop in putMapEntries:

--- HashMap.java.~1.9.~ 2018-11-11 15:43:24.982878495 -0500
+++ HashMap.java2018-12-16 09:05:48.924727867 -0500
@@ -502,8 +502,13 @@
 if (t > threshold)
 threshold = tableSizeFor(t);
 }
-else if (s > threshold)
-resize();
+else {
+// Because of LinkedHashMap constraints, we cannot
+// expand all at once, but can reduce total resize
+// effort by repeated doubling now vs later
+while (table.length <  MAXIMUM_CAPACITY && s > threshold)
+resize();
+}
 for (Map.Entry e : m.entrySet()) {
 K key = e.getKey();
 V value = e.getValue();

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-14 Thread Michal Vala


Hi,

here's the new webrev: 
http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.02/


I solved the issue by incrementally doubling the new table, before adding new 
elements. This solution has no such performance boost as first buggy one, it's 
still measurable for case when adding big map to another, small non-empty map. 
This is faster because doubling happens with a lot less elements than it would 
happen while adding new elements.


one of benchmark runs
clean upstream:
HashMapBench.putAllWithBigMapToNonEmptyMap  avgt   10  189.055 ± 12.911  ms/op

with patch:
HashMapBench.putAllWithBigMapToNonEmptyMap  avgt   10  211.921 ± 23.793  ms/op



I've also tried generic approach without holding head and tail pointers to the 
bucket. However, need of preserve the order kills this solution. Tail pointer 
boost is too good.
Another generic approach would be to hold tail pointers just where needed, but I 
couldn't find effective solution for now (without e.g. having second table of 
the same size, which is killer).



I'm still looking into removeEldestEntry with LinkedHashMap.


On 12/14/18 7:37 AM, Michal Vala wrote:

Thanks Martin for finding this serious issue and the testcase.

I understand the issue, but so far I've been unable to find effective enough 
solution that beats high/low head/tail bucket splitting. I'll keep looking into 
it and I'll propose a new patch or write some summary and results of my 
experiments. Probably next week.


On 12/12/18 9:16 PM, Martin Buchholz wrote:

Software is hard.

I found myself removing the remaining style changes to be able to review
the changes.
(We're going to have to disagree about the value of curlies).
Here's my reduction:

--- src/main/java/util/HashMap.java 11 Nov 2018 16:27:28 - 1.9
+++ src/main/java/util/HashMap.java 12 Dec 2018 20:10:03 -
@@ -503,7 +503,7 @@
  threshold = tableSizeFor(t);
  }
  else if (s > threshold)
-    resize();
+    resize(s);
  for (Map.Entry e : m.entrySet()) {
  K key = e.getKey();
  V value = e.getValue();
@@ -661,6 +661,30 @@
  }

  /**
+ * Resizes the table to the nearest power of two to {@code size}.
+ * Moves all items to the new table.
+ *
+ * @param size expected number of elements in the new table
+ * @return the table
+ */
+    final Node[] resize(int size) {
+    if (size < 0) {
+    throw new IllegalArgumentException("Negative number of
elements does not make sense.");
+    }
+    Node[] oldTable = table;
+    int oldCapacity = (oldTable == null) ? 0 : oldTable.length;
+    int newCapacity = tableSizeFor(size);
+
+    // need to resize?
+    if (newCapacity > oldCapacity) {
+    threshold = (int) ((float) newCapacity * loadFactor);
+    return createTableAndMoveElements(newCapacity, oldCapacity,
oldTable);
+    } else {
+    return oldTable;
+    }
+    }
+
+    /**
   * Initializes or doubles table size.  If null, allocates in
   * accord with initial capacity target held in field threshold.
   * Otherwise, because we are using power-of-two expansion, the
@@ -695,6 +719,11 @@
    (int)ft : Integer.MAX_VALUE);
  }
  threshold = newThr;
+
+    return createTableAndMoveElements(newCap, oldCap, oldTab);
+    }
+
+    private Node[] createTableAndMoveElements(int newCap, int oldCap,
Node[] oldTab) {
  @SuppressWarnings({"rawtypes","unchecked"})
  Node[] newTab = (Node[])new Node[newCap];
  table = newTab;


Here's a test that fails with the proposed patch:

https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/miscellaneous/index.html 



 /**
  * "Missing" test found while investigating JDK-8210280.
  * See discussion on mailing list.
  * TODO: randomize
  */
 public void testBug8210280() {
 Map m = impl.emptyMap();
 for (int i = 0; i < 4; i++) m.put(7 + i * 16, 0);
 Map more = impl.emptyMap();
 for (int i = 0; i < 128; i++) more.put(-i, 42);
 m.putAll(more);
 for (int i = 0; i < 4; i++) assertEquals(0, m.get(7 + i * 16));
 }





--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-13 Thread Michal Vala


Thanks, looking into that.

On 12/12/18 9:29 PM, Martin Buchholz wrote:

When hacking on HashMap, one always needs to keep LinkedHashMap in the back
of one's mind.

If you use removeEldestEntry then the occupancy can never get beyond some
limit.  It may be weird to have elements that are part of a putAll being
removed during the call to putAll.  There's already a call to resize
causing the backing array to grow in this (very corner) case, so it's
arguably not introducing new weird behavior.  If removeEldestEntry is
overridden then ideally you would only grow by doubling, but it's such a
corner case this may not be worth doing.



--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-13 Thread Michal Vala


Thanks Martin for finding this serious issue and the testcase.

I understand the issue, but so far I've been unable to find effective enough 
solution that beats high/low head/tail bucket splitting. I'll keep looking into 
it and I'll propose a new patch or write some summary and results of my 
experiments. Probably next week.


On 12/12/18 9:16 PM, Martin Buchholz wrote:

Software is hard.

I found myself removing the remaining style changes to be able to review
the changes.
(We're going to have to disagree about the value of curlies).
Here's my reduction:

--- src/main/java/util/HashMap.java 11 Nov 2018 16:27:28 - 1.9
+++ src/main/java/util/HashMap.java 12 Dec 2018 20:10:03 -
@@ -503,7 +503,7 @@
  threshold = tableSizeFor(t);
  }
  else if (s > threshold)
-resize();
+resize(s);
  for (Map.Entry e : m.entrySet()) {
  K key = e.getKey();
  V value = e.getValue();
@@ -661,6 +661,30 @@
  }

  /**
+ * Resizes the table to the nearest power of two to {@code size}.
+ * Moves all items to the new table.
+ *
+ * @param size expected number of elements in the new table
+ * @return the table
+ */
+final Node[] resize(int size) {
+if (size < 0) {
+throw new IllegalArgumentException("Negative number of
elements does not make sense.");
+}
+Node[] oldTable = table;
+int oldCapacity = (oldTable == null) ? 0 : oldTable.length;
+int newCapacity = tableSizeFor(size);
+
+// need to resize?
+if (newCapacity > oldCapacity) {
+threshold = (int) ((float) newCapacity * loadFactor);
+return createTableAndMoveElements(newCapacity, oldCapacity,
oldTable);
+} else {
+return oldTable;
+}
+}
+
+/**
   * Initializes or doubles table size.  If null, allocates in
   * accord with initial capacity target held in field threshold.
   * Otherwise, because we are using power-of-two expansion, the
@@ -695,6 +719,11 @@
(int)ft : Integer.MAX_VALUE);
  }
  threshold = newThr;
+
+return createTableAndMoveElements(newCap, oldCap, oldTab);
+}
+
+private Node[] createTableAndMoveElements(int newCap, int oldCap,
Node[] oldTab) {
  @SuppressWarnings({"rawtypes","unchecked"})
  Node[] newTab = (Node[])new Node[newCap];
  table = newTab;


Here's a test that fails with the proposed patch:

https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/miscellaneous/index.html

 /**
  * "Missing" test found while investigating JDK-8210280.
  * See discussion on mailing list.
  * TODO: randomize
  */
 public void testBug8210280() {
 Map m = impl.emptyMap();
 for (int i = 0; i < 4; i++) m.put(7 + i * 16, 0);
 Map more = impl.emptyMap();
 for (int i = 0; i < 128; i++) more.put(-i, 42);
 m.putAll(more);
 for (int i = 0; i < 4; i++) assertEquals(0, m.get(7 + i * 16));
 }



--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-12 Thread Martin Buchholz

When hacking on HashMap, one always needs to keep LinkedHashMap in the back
of one's mind.

If you use removeEldestEntry then the occupancy can never get beyond some
limit.  It may be weird to have elements that are part of a putAll being
removed during the call to putAll.  There's already a call to resize
causing the backing array to grow in this (very corner) case, so it's
arguably not introducing new weird behavior.  If removeEldestEntry is
overridden then ideally you would only grow by doubling, but it's such a
corner case this may not be worth doing.

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-12 Thread Martin Buchholz

Software is hard.

I found myself removing the remaining style changes to be able to review
the changes.
(We're going to have to disagree about the value of curlies).
Here's my reduction:

--- src/main/java/util/HashMap.java 11 Nov 2018 16:27:28 - 1.9
+++ src/main/java/util/HashMap.java 12 Dec 2018 20:10:03 -
@@ -503,7 +503,7 @@
 threshold = tableSizeFor(t);
 }
 else if (s > threshold)
-resize();
+resize(s);
 for (Map.Entry e : m.entrySet()) {
 K key = e.getKey();
 V value = e.getValue();
@@ -661,6 +661,30 @@
 }

 /**
+ * Resizes the table to the nearest power of two to {@code size}.
+ * Moves all items to the new table.
+ *
+ * @param size expected number of elements in the new table
+ * @return the table
+ */
+final Node[] resize(int size) {
+if (size < 0) {
+throw new IllegalArgumentException("Negative number of
elements does not make sense.");
+}
+Node[] oldTable = table;
+int oldCapacity = (oldTable == null) ? 0 : oldTable.length;
+int newCapacity = tableSizeFor(size);
+
+// need to resize?
+if (newCapacity > oldCapacity) {
+threshold = (int) ((float) newCapacity * loadFactor);
+return createTableAndMoveElements(newCapacity, oldCapacity,
oldTable);
+} else {
+return oldTable;
+}
+}
+
+/**
  * Initializes or doubles table size.  If null, allocates in
  * accord with initial capacity target held in field threshold.
  * Otherwise, because we are using power-of-two expansion, the
@@ -695,6 +719,11 @@
   (int)ft : Integer.MAX_VALUE);
 }
 threshold = newThr;
+
+return createTableAndMoveElements(newCap, oldCap, oldTab);
+}
+
+private Node[] createTableAndMoveElements(int newCap, int oldCap,
Node[] oldTab) {
 @SuppressWarnings({"rawtypes","unchecked"})
 Node[] newTab = (Node[])new Node[newCap];
 table = newTab;


Here's a test that fails with the proposed patch:

https://cr.openjdk.java.net/~martin/webrevs/jdk/jsr166-integration/miscellaneous/index.html

/**
 * "Missing" test found while investigating JDK-8210280.
 * See discussion on mailing list.
 * TODO: randomize
 */
public void testBug8210280() {
Map m = impl.emptyMap();
for (int i = 0; i < 4; i++) m.put(7 + i * 16, 0);
Map more = impl.emptyMap();
for (int i = 0; i < 128; i++) more.put(-i, 42);
m.putAll(more);
for (int i = 0; i < 4; i++) assertEquals(0, m.get(7 + i * 16));
}

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-12 Thread Michal Vala

thank you Roger, I've added jmh benchmark. see new webrev in reply to Martin's 
review.



On 12/12/18 4:45 AM, Roger Riggs wrote:

Hi,

fyi, jmh tests appropriate for JDK apis are located in the test/micro/... 
directories.


The benchmarks are built with:
   % make build-microbenchmark

The results are in the build//images/test/micro/benchmarks.jar

$.02, Roger

On 12/11/2018 A 9:55 PM, Martin Buchholz wrote:

Hi Michal, pleased to meet you.  I'll be your sponsor.

The test will need a legal header, presumably similar to others authored by
redhatters.

It is now possible to check in jmh microbenchmarks (but I've never done so
myself).

The current coding style is non-standard, but deliberate; avoid gratuitous
changes like
-    Node e;
+    Node e;

As author of concurrent/ConcurrentHashMap/WhiteBox.java ...
- I thought you'd need a @modules ... did this test actually pass?
- you should have some kind of @summary that includes the word "whitebox"
- why not "throws ReflectiveOperationException" ?
- your reviewer is a bit on the autistic side, so please change /** to /*
on the @test comment (it's not javadoc!)


On Tue, Dec 11, 2018 at 11:06 AM Michal Vala  wrote:


Hi,

I've been looking into this bug in HashMap where resize() is called
multiple
times when putting whole map into another.

I come up with following patch:
http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.00/

I've tried to do it as little invasive as possible. New resize(int) method
is
called just from putMapEntries for now. Notice that method is called with
size
of the new map. I was experimenting with calling it with 'size()+s', but
this
leads to unwanted space allocation when inserted map rewrites a lot of
existing
keys.

I've benchmarked this with adding 10millions elements into existing map
and it
gets ~50% improvement:

unpatched
Benchmark    Mode  Cnt  Score   Error  Units
MyBenchmark.testMethod  thrpt   10  1.248 ± 0.058  ops/s

patched
Benchmark    Mode  Cnt  Score   Error  Units
MyBenchmark.testMethod  thrpt   10  1.872 ± 0.109  ops/s

public class MyBenchmark {
  static Map mapTocopy = IntStream.range(1,
10_000_000).boxed().collect(Collectors.toMap(k -> k,
  k -> k));

  @Benchmark
  public int testMethod() {
  var map = new HashMap();
  map.put(-5, -5);
  map.putAll(mapTocopy);
  return map.size();
  }
}

Any ideas for missed corner-cases or some good tests?

--
Michal Vala
OpenJDK QE
Red Hat Czech





--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-12 Thread Michal Vala


Hi Martin,
thanks for review and sponsorship. All fixed, details inlined.

new webrev: http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.01/

On 12/12/18 3:55 AM, Martin Buchholz wrote:

Hi Michal, pleased to meet you.  I'll be your sponsor.

The test will need a legal header, presumably similar to others authored by
redhatters.

fixed



It is now possible to check in jmh microbenchmarks (but I've never done so
myself).

done



The current coding style is non-standard, but deliberate; avoid gratuitous
changes like
-Node e;
+Node e;

Fixed.
I insist on adding {} to if/else as it is extremely prone to errors. I can add 
it in different changeset if you like.




As author of concurrent/ConcurrentHashMap/WhiteBox.java ... > - I thought you'd need a @modules ... did this test actually pass?It pass but 

with WARNING: An illegal reflective access operation has occurred. Fixed.


- you should have some kind of @summary that includes the word "whitebox"

fixed


- why not "throws ReflectiveOperationException" ?

fixed


- your reviewer is a bit on the autistic side, so please change /** to /*
on the @test comment (it's not javadoc!)
fixed (other tests in HashMap have also /**. I guess it does not worth the 
separate fix. or?)





On Tue, Dec 11, 2018 at 11:06 AM Michal Vala  wrote:


Hi,

I've been looking into this bug in HashMap where resize() is called
multiple
times when putting whole map into another.

I come up with following patch:
http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.00/

I've tried to do it as little invasive as possible. New resize(int) method
is
called just from putMapEntries for now. Notice that method is called with
size
of the new map. I was experimenting with calling it with 'size()+s', but
this
leads to unwanted space allocation when inserted map rewrites a lot of
existing
keys.

I've benchmarked this with adding 10millions elements into existing map
and it
gets ~50% improvement:

unpatched
BenchmarkMode  Cnt  Score   Error  Units
MyBenchmark.testMethod  thrpt   10  1.248 ± 0.058  ops/s

patched
BenchmarkMode  Cnt  Score   Error  Units
MyBenchmark.testMethod  thrpt   10  1.872 ± 0.109  ops/s

public class MyBenchmark {
  static Map mapTocopy = IntStream.range(1,
10_000_000).boxed().collect(Collectors.toMap(k -> k,
  k -> k));

  @Benchmark
  public int testMethod() {
  var map = new HashMap();
  map.put(-5, -5);
  map.putAll(mapTocopy);
  return map.size();
  }
}

Any ideas for missed corner-cases or some good tests?

--
Michal Vala
OpenJDK QE
Red Hat Czech





--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-11 Thread Roger Riggs


Hi,

fyi, jmh tests appropriate for JDK apis are located in the 
test/micro/... directories.


The benchmarks are built with:
  % make build-microbenchmark

The results are in the build//images/test/micro/benchmarks.jar

$.02, Roger

On 12/11/2018 A 9:55 PM, Martin Buchholz wrote:

Hi Michal, pleased to meet you.  I'll be your sponsor.

The test will need a legal header, presumably similar to others authored by
redhatters.

It is now possible to check in jmh microbenchmarks (but I've never done so
myself).

The current coding style is non-standard, but deliberate; avoid gratuitous
changes like
-Node e;
+Node e;

As author of concurrent/ConcurrentHashMap/WhiteBox.java ...
- I thought you'd need a @modules ... did this test actually pass?
- you should have some kind of @summary that includes the word "whitebox"
- why not "throws ReflectiveOperationException" ?
- your reviewer is a bit on the autistic side, so please change /** to /*
on the @test comment (it's not javadoc!)


On Tue, Dec 11, 2018 at 11:06 AM Michal Vala  wrote:


Hi,

I've been looking into this bug in HashMap where resize() is called
multiple
times when putting whole map into another.

I come up with following patch:
http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.00/

I've tried to do it as little invasive as possible. New resize(int) method
is
called just from putMapEntries for now. Notice that method is called with
size
of the new map. I was experimenting with calling it with 'size()+s', but
this
leads to unwanted space allocation when inserted map rewrites a lot of
existing
keys.

I've benchmarked this with adding 10millions elements into existing map
and it
gets ~50% improvement:

unpatched
BenchmarkMode  Cnt  Score   Error  Units
MyBenchmark.testMethod  thrpt   10  1.248 ± 0.058  ops/s

patched
BenchmarkMode  Cnt  Score   Error  Units
MyBenchmark.testMethod  thrpt   10  1.872 ± 0.109  ops/s

public class MyBenchmark {
  static Map mapTocopy = IntStream.range(1,
10_000_000).boxed().collect(Collectors.toMap(k -> k,
  k -> k));

  @Benchmark
  public int testMethod() {
  var map = new HashMap();
  map.put(-5, -5);
  map.putAll(mapTocopy);
  return map.size();
  }
}

Any ideas for missed corner-cases or some good tests?

--
Michal Vala
OpenJDK QE
Red Hat Czech

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

2018-12-11 Thread Martin Buchholz

Hi Michal, pleased to meet you.  I'll be your sponsor.

The test will need a legal header, presumably similar to others authored by
redhatters.

It is now possible to check in jmh microbenchmarks (but I've never done so
myself).

The current coding style is non-standard, but deliberate; avoid gratuitous
changes like
-Node e;
+Node e;

As author of concurrent/ConcurrentHashMap/WhiteBox.java ...
- I thought you'd need a @modules ... did this test actually pass?
- you should have some kind of @summary that includes the word "whitebox"
- why not "throws ReflectiveOperationException" ?
- your reviewer is a bit on the autistic side, so please change /** to /*
on the @test comment (it's not javadoc!)


On Tue, Dec 11, 2018 at 11:06 AM Michal Vala  wrote:

> Hi,
>
> I've been looking into this bug in HashMap where resize() is called
> multiple
> times when putting whole map into another.
>
> I come up with following patch:
> http://cr.openjdk.java.net/~mvala/jdk/jdk/JDK-8210280/webrev.00/
>
> I've tried to do it as little invasive as possible. New resize(int) method
> is
> called just from putMapEntries for now. Notice that method is called with
> size
> of the new map. I was experimenting with calling it with 'size()+s', but
> this
> leads to unwanted space allocation when inserted map rewrites a lot of
> existing
> keys.
>
> I've benchmarked this with adding 10millions elements into existing map
> and it
> gets ~50% improvement:
>
> unpatched
> BenchmarkMode  Cnt  Score   Error  Units
> MyBenchmark.testMethod  thrpt   10  1.248 ± 0.058  ops/s
>
> patched
> BenchmarkMode  Cnt  Score   Error  Units
> MyBenchmark.testMethod  thrpt   10  1.872 ± 0.109  ops/s
>
> public class MyBenchmark {
>  static Map mapTocopy = IntStream.range(1,
> 10_000_000).boxed().collect(Collectors.toMap(k -> k,
>  k -> k));
>
>  @Benchmark
>  public int testMethod() {
>  var map = new HashMap();
>  map.put(-5, -5);
>  map.putAll(mapTocopy);
>  return map.size();
>  }
> }
>
> Any ideas for missed corner-cases or some good tests?
>
> --
> Michal Vala
> OpenJDK QE
> Red Hat Czech
>

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

RE: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

Re: JDK-8210280 - Unnecessary reallocation when invoking HashMap.putAll()

33 matches

Site Navigation

Mail list logo

Footer information