Re: RFR: 8247532: Records deserialization is slow

2020-06-23 Thread Claes Redestad




On 2020-06-23 10:06, Claes Redestad wrote:

Hi,

On 2020-06-23 09:49, Peter Levart wrote:

Hi Chris, Claes,


Ok then, here's with benchmark included:


http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/ 




I haven't been able to run the benchmark with "make test" though. I 
have tried various ways to pass javac options to build like:



make test TEST='micro:org.openjdk.bench.java.io.RecordDeserialization' 
TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"



...but javac doesn't seem to get them. Is there some secret option to 
achieve that?


Hmm, we might as well have the microbenchmarks build with
--enable-preview on by default. Try this:


Fixed:

diff -r f2e1cd498381 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:08:35 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 10:33:17 2020 +0200
@@ -90,10 +90,11 @@
 TARGET_RELEASE := $(TARGET_RELEASE_NEWJDK_UPGRADED), \
 SMALL_JAVA := false, \
 CLASSPATH := $(MICROBENCHMARK_CLASSPATH), \
-DISABLED_WARNINGS := processing rawtypes cast serial, \
+DISABLED_WARNINGS := processing rawtypes cast serial preview, \
 SRC := $(MICROBENCHMARK_SRC), \
 BIN := $(MICROBENCHMARK_CLASSES), \
 JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules 
java.management, \

+JAVAC_FLAGS := --enable-preview, \
 ))

 $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)

I verified this works with your micro, and doesn't seem to cause
any issues elsewhere:

 make test TEST=micro:RecordDeserialization

I can shepherd this as a separate fix for documentation purposes, but
feel free to include it in your patch and ping build-dev@..

/Claes



diff -r 52741f85bf23 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk    Tue Jun 23 09:54:42 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk    Tue Jun 23 09:59:29 2020 +0200
@@ -93,7 +93,7 @@
  DISABLED_WARNINGS := processing rawtypes cast serial, \
  SRC := $(MICROBENCHMARK_SRC), \
  BIN := $(MICROBENCHMARK_CLASSES), \
-    JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules 
java.management, \
+    JAVA_FLAGS := --enable-preview --add-modules jdk.unsupported 
--limit-modules java.management, \

  ))

  $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)




Otherwise, I simulated what would happen when there are more then 10 
ObjectStreamClass layouts for same class rapidly interchanging, so 
that they push each other out of the cache, by temporarily setting 
cache's MAX_SIZE = 0. Note that this is worst case scenario:



Benchmark  (length)  Mode  Cnt
Score    Error  Units
RecordDeserializationBench.deserializeClasses    10  avgt   10
9.393 ±  0.287  us/op
RecordDeserializationBench.deserializeClasses   100  avgt   10   
35.642 ±  0.977  us/op
RecordDeserializationBench.deserializeClasses  1000  avgt   10  
293.769 ±  7.321  us/op
RecordDeserializationBench.deserializeRecords    10  avgt   10   
15.335 ±  0.496  us/op
RecordDeserializationBench.deserializeRecords   100  avgt   10  
211.427 ± 11.908  us/op
RecordDeserializationBench.deserializeRecords  1000  avgt   10  
990.398 ± 26.681  us/op



This is using JMH option '-gc true' to force GC after each iteration 
of benchmark. Without it, I get a big (~4s) full-GC pause just in the 
middle of a run with length=100:



Iteration   1: 528.577 us/op
Iteration   2: 580.404 us/op
Iteration   3: 4438.228 us/op
Iteration   4: 644.532 us/op
Iteration   5: 698.493 us/op
Iteration   6: 800.738 us/op
Iteration   7: 929.791 us/op
Iteration   8: 870.946 us/op
Iteration   9: 863.416 us/op
Iteration  10: 916.508 us/op


...so results are a bit off because of that:


Benchmark  (length)  Mode  Cnt 
Score  Error  Units
RecordDeserializationBench.deserializeClasses    10  avgt   10 
8.263 ±    0.043  us/op
RecordDeserializationBench.deserializeClasses   100  avgt   10
33.406 ±    0.160  us/op
RecordDeserializationBench.deserializeClasses  1000  avgt   10   
287.595 ±    0.960  us/op
RecordDeserializationBench.deserializeRecords    10  avgt   10
15.270 ±    0.080  us/op
RecordDeserializationBench.deserializeRecords   100  avgt   10  
1127.163 ± 1771.892  us/op
RecordDeserializationBench.deserializeRecords  1000  avgt   10  
2003.235 ±  227.159  us/op



Yes, there is quite a bit of GCing going on when cache is thrashing. 
Note that I haven't tuned GC in any way and I'm running this on a 
machine with 64GiB of RAM so heap is allowed to grow quite big and G1 
is used by default I think.



This is still no worse than before the patch:


Benchmark (length)  Mode  Cnt 
Score    Error  Units
RecordDeserialization.deserializeClasses    10  avgt   10 
8.382 :  0.013  us/op
RecordDeserialization.deserializeClasses   100  avgt   10
33.736 :  0.171  us/op

Re: RFR: 8247532: Records deserialization is slow

2020-06-23 Thread Claes Redestad

Hi,

On 2020-06-23 09:49, Peter Levart wrote:

Hi Chris, Claes,


Ok then, here's with benchmark included:


http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/


I haven't been able to run the benchmark with "make test" though. I have 
tried various ways to pass javac options to build like:



make test TEST='micro:org.openjdk.bench.java.io.RecordDeserialization' 
TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"


...but javac doesn't seem to get them. Is there some secret option to 
achieve that?


Hmm, we might as well have the microbenchmarks build with
--enable-preview on by default. Try this:

diff -r 52741f85bf23 make/test/BuildMicrobenchmark.gmk
--- a/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:54:42 2020 +0200
+++ b/make/test/BuildMicrobenchmark.gmk Tue Jun 23 09:59:29 2020 +0200
@@ -93,7 +93,7 @@
 DISABLED_WARNINGS := processing rawtypes cast serial, \
 SRC := $(MICROBENCHMARK_SRC), \
 BIN := $(MICROBENCHMARK_CLASSES), \
-JAVA_FLAGS := --add-modules jdk.unsupported --limit-modules 
java.management, \
+JAVA_FLAGS := --enable-preview --add-modules jdk.unsupported 
--limit-modules java.management, \

 ))

 $(BUILD_JDK_MICROBENCHMARK): $(JMH_COMPILE_JARS)




Otherwise, I simulated what would happen when there are more then 10 
ObjectStreamClass layouts for same class rapidly interchanging, so that 
they push each other out of the cache, by temporarily setting cache's 
MAX_SIZE = 0. Note that this is worst case scenario:



Benchmark  (length)  Mode  CntScore
Error  Units
RecordDeserializationBench.deserializeClasses10  avgt   109.393 ±  
0.287  us/op
RecordDeserializationBench.deserializeClasses   100  avgt   10   35.642 ±  
0.977  us/op
RecordDeserializationBench.deserializeClasses  1000  avgt   10  293.769 ±  
7.321  us/op
RecordDeserializationBench.deserializeRecords10  avgt   10   15.335 ±  
0.496  us/op
RecordDeserializationBench.deserializeRecords   100  avgt   10  211.427 ± 
11.908  us/op
RecordDeserializationBench.deserializeRecords  1000  avgt   10  990.398 ± 
26.681  us/op


This is using JMH option '-gc true' to force GC after each iteration of 
benchmark. Without it, I get a big (~4s) full-GC pause just in the 
middle of a run with length=100:



Iteration   1: 528.577 us/op
Iteration   2: 580.404 us/op
Iteration   3: 4438.228 us/op
Iteration   4: 644.532 us/op
Iteration   5: 698.493 us/op
Iteration   6: 800.738 us/op
Iteration   7: 929.791 us/op
Iteration   8: 870.946 us/op
Iteration   9: 863.416 us/op
Iteration  10: 916.508 us/op


...so results are a bit off because of that:


Benchmark  (length)  Mode  Cnt Score
  Error  Units
RecordDeserializationBench.deserializeClasses    10  avgt   10 8.263 ±  
  0.043  us/op
RecordDeserializationBench.deserializeClasses   100  avgt   10    33.406 ±  
  0.160  us/op
RecordDeserializationBench.deserializeClasses  1000  avgt   10   287.595 ±  
  0.960  us/op
RecordDeserializationBench.deserializeRecords    10  avgt   10    15.270 ±  
  0.080  us/op
RecordDeserializationBench.deserializeRecords   100  avgt   10  1127.163 ± 
1771.892  us/op
RecordDeserializationBench.deserializeRecords  1000  avgt   10  2003.235 ±  
227.159  us/op


Yes, there is quite a bit of GCing going on when cache is thrashing. 
Note that I haven't tuned GC in any way and I'm running this on a 
machine with 64GiB of RAM so heap is allowed to grow quite big and G1 is 
used by default I think.



This is still no worse than before the patch:


Benchmark (length)  Mode  Cnt Score
Error  Units
RecordDeserialization.deserializeClasses10  avgt   10 8.382 :  
0.013  us/op
RecordDeserialization.deserializeClasses   100  avgt   1033.736 :  
0.171  us/op
RecordDeserialization.deserializeClasses  1000  avgt   10   271.224 :  
0.953  us/op
RecordDeserialization.deserializeRecords10  avgt   1058.606 :  
0.446  us/op
RecordDeserialization.deserializeRecords   100  avgt   10   530.044 :  
1.752  us/op
RecordDeserialization.deserializeRecords  1000  avgt   10  5335.624 : 
44.942  us/op


... since caching of adapted method handle for multiple objects withing 
single stream is still effective.


I doubt there will ever be more than 10 variants/versions of the same 
record class deserialized by the same VM and in rapid succession, so I 
think this should not be an issue. We could add a system property to 
control the MAX_SIZE of cache if you think it is needed.


Thanks for running the numbers on this, and I agree - it seems
outlandishly improbable (most will only see one, and if you have to
maintain 10+ different serialized shapes of some record you likely have 
bigger problems).


I'd say let's keep it constant unless someone actually asks for it.

/Claes




Regards, Peter


On 6/22/20 1:04 AM, Claes 

Re: RFR: 8247532: Records deserialization is slow

2020-06-23 Thread Peter Levart

Hi Chris, Claes,


Ok then, here's with benchmark included:


http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.07/


I haven't been able to run the benchmark with "make test" though. I have 
tried various ways to pass javac options to build like:



make test TEST='micro:org.openjdk.bench.java.io.RecordDeserialization' 
TEST_OPTS="VM_OPTIONS=--enable-preview --release=16"


...but javac doesn't seem to get them. Is there some secret option to 
achieve that?



Otherwise, I simulated what would happen when there are more then 10 
ObjectStreamClass layouts for same class rapidly interchanging, so that 
they push each other out of the cache, by temporarily setting cache's 
MAX_SIZE = 0. Note that this is worst case scenario:



Benchmark  (length)  Mode  CntScore
Error  Units
RecordDeserializationBench.deserializeClasses10  avgt   109.393 ±  
0.287  us/op
RecordDeserializationBench.deserializeClasses   100  avgt   10   35.642 ±  
0.977  us/op
RecordDeserializationBench.deserializeClasses  1000  avgt   10  293.769 ±  
7.321  us/op
RecordDeserializationBench.deserializeRecords10  avgt   10   15.335 ±  
0.496  us/op
RecordDeserializationBench.deserializeRecords   100  avgt   10  211.427 ± 
11.908  us/op
RecordDeserializationBench.deserializeRecords  1000  avgt   10  990.398 ± 
26.681  us/op


This is using JMH option '-gc true' to force GC after each iteration of 
benchmark. Without it, I get a big (~4s) full-GC pause just in the 
middle of a run with length=100:



Iteration   1: 528.577 us/op
Iteration   2: 580.404 us/op
Iteration   3: 4438.228 us/op
Iteration   4: 644.532 us/op
Iteration   5: 698.493 us/op
Iteration   6: 800.738 us/op
Iteration   7: 929.791 us/op
Iteration   8: 870.946 us/op
Iteration   9: 863.416 us/op
Iteration  10: 916.508 us/op


...so results are a bit off because of that:


Benchmark  (length)  Mode  Cnt Score
  Error  Units
RecordDeserializationBench.deserializeClasses    10  avgt   10 8.263 ±  
  0.043  us/op
RecordDeserializationBench.deserializeClasses   100  avgt   10    33.406 ±  
  0.160  us/op
RecordDeserializationBench.deserializeClasses  1000  avgt   10   287.595 ±  
  0.960  us/op
RecordDeserializationBench.deserializeRecords    10  avgt   10    15.270 ±  
  0.080  us/op
RecordDeserializationBench.deserializeRecords   100  avgt   10  1127.163 ± 
1771.892  us/op
RecordDeserializationBench.deserializeRecords  1000  avgt   10  2003.235 ±  
227.159  us/op


Yes, there is quite a bit of GCing going on when cache is thrashing. 
Note that I haven't tuned GC in any way and I'm running this on a 
machine with 64GiB of RAM so heap is allowed to grow quite big and G1 is 
used by default I think.



This is still no worse than before the patch:


Benchmark (length)  Mode  Cnt Score
Error  Units
RecordDeserialization.deserializeClasses10  avgt   10 8.382 :  
0.013  us/op
RecordDeserialization.deserializeClasses   100  avgt   1033.736 :  
0.171  us/op
RecordDeserialization.deserializeClasses  1000  avgt   10   271.224 :  
0.953  us/op
RecordDeserialization.deserializeRecords10  avgt   1058.606 :  
0.446  us/op
RecordDeserialization.deserializeRecords   100  avgt   10   530.044 :  
1.752  us/op
RecordDeserialization.deserializeRecords  1000  avgt   10  5335.624 : 
44.942  us/op


... since caching of adapted method handle for multiple objects withing 
single stream is still effective.


I doubt there will ever be more than 10 variants/versions of the same 
record class deserialized by the same VM and in rapid succession, so I 
think this should not be an issue. We could add a system property to 
control the MAX_SIZE of cache if you think it is needed.



Regards, Peter


On 6/22/20 1:04 AM, Claes Redestad wrote:

Hi Peter,

patch and results look great!

My only real comment on this is that I think the microbenchmark would be
a valuable contribution, too.

It'd also be interesting to explore how poor performance would become if
we'd hit the (artificial) 11 layouts limit, e.g, by cycling through
10, 11, or 12 different shapes.

/Claes

On 2020-06-21 19:16, Peter Levart wrote:

Hi,


When re-running the benchmark [1] with different lengths of 
serialized arrays of records, I found that, compared to classical 
classes, lookup into the cache of adapted method handles starts to 
show when the length of array is larger (# of instances of same 
record type deserialized in single stream). Each record deserialized 
must lookup the method handle in a ConcurrentHashMap:



Benchmark    (length)  Mode Cnt    
Score   Error  Units
RecordSerializationBench.deserializeClasses    10  avgt 10    
8.088 ± 0.013  us/op
RecordSerializationBench.deserializeClasses   100  avgt 10   
32.171 ± 0.324  us/op

Re: RFR: 8247532: Records deserialization is slow

2020-06-22 Thread Chris Hegarty
Peter,

Thank you for taking the suggestion for additional test scenarios, improving 
coverage, and finding-and-fixing the issues in my hurried test builder (was 
only meant as a rough guide/idea).

I agree with Claes, it would be good to include the microbenchmark in 
test/micro/..

I think this version, 0.6, is very good. There is quite a bit of code to deal 
with the caching and lookup, but I don’t see an existing alternative 
implementation in the JDK that could be reused or moulded to fit this 
particular use case.

I experimented a little with spinning a class to hold the transformed MH in the 
constant pool, and accessing through a SAM type, just to see it if had any 
positive effect on the benchmark numbers - it did not.

-Chris.


> On 21 Jun 2020, at 18:16, Peter Levart  wrote:
> 
> Hi,
> 
> 
> 
> When re-running the benchmark [1] with different lengths of serialized arrays 
> of records, I found that, compared to classical classes, lookup into the 
> cache of adapted method handles starts to show when the length of array is 
> larger (# of instances of same record type deserialized in single stream). 
> Each record deserialized must lookup the method handle in a ConcurrentHashMap:
> 
> 
> 
> Benchmark(length)  Mode  CntScore   
> Error  Units
> RecordSerializationBench.deserializeClasses10  avgt   108.088 ± 
> 0.013  us/op
> RecordSerializationBench.deserializeClasses   100  avgt   10   32.171 ± 
> 0.324  us/op
> RecordSerializationBench.deserializeClasses  1000  avgt   10  279.762 ± 
> 3.072  us/op
> RecordSerializationBench.deserializeRecords10  avgt   109.011 ± 
> 0.027  us/op
> RecordSerializationBench.deserializeRecords   100  avgt   10   33.206 ± 
> 0.514  us/op
> RecordSerializationBench.deserializeRecords  1000  avgt   10  325.137 ± 
> 0.969  us/op
> 
> 
> ...so keeping the correctly shaped adapted method handle in the 
> per-serialization-session ObjectStreamClass instance [2] starts to make sense:
> 
> 
> 
> Benchmark(length)  Mode  CntScore   
> Error  Units
> RecordSerializationBench.deserializeClasses10  avgt   108.681 ± 
> 0.155  us/op
> RecordSerializationBench.deserializeClasses   100  avgt   10   32.496 ± 
> 0.087  us/op
> RecordSerializationBench.deserializeClasses  1000  avgt   10  279.014 ± 
> 1.189  us/op
> RecordSerializationBench.deserializeRecords10  avgt   108.537 ± 
> 0.032  us/op
> RecordSerializationBench.deserializeRecords   100  avgt   10   31.451 ± 
> 0.083  us/op
> RecordSerializationBench.deserializeRecords  1000  avgt   10  250.854 ± 
> 2.772  us/op
> 
> 
> With that, more objects means advantage over classical classes instead of 
> disadvantage.
> 
> 
> 
> [1] 
> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java
>  
> 
> [2] 
> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/ 
> 
> 
> Regards, Peter
> 
> 
> 



Re: RFR: 8247532: Records deserialization is slow

2020-06-21 Thread Claes Redestad

Hi Peter,

patch and results look great!

My only real comment on this is that I think the microbenchmark would be
a valuable contribution, too.

It'd also be interesting to explore how poor performance would become if
we'd hit the (artificial) 11 layouts limit, e.g, by cycling through
10, 11, or 12 different shapes.

/Claes

On 2020-06-21 19:16, Peter Levart wrote:

Hi,


When re-running the benchmark [1] with different lengths of serialized 
arrays of records, I found that, compared to classical classes, lookup 
into the cache of adapted method handles starts to show when the length 
of array is larger (# of instances of same record type deserialized in 
single stream). Each record deserialized must lookup the method handle 
in a ConcurrentHashMap:



Benchmark    (length)  Mode  Cnt
Score   Error  Units
RecordSerializationBench.deserializeClasses    10  avgt   10
8.088 ± 0.013  us/op
RecordSerializationBench.deserializeClasses   100  avgt   10   
32.171 ± 0.324  us/op
RecordSerializationBench.deserializeClasses  1000  avgt   10  
279.762 ± 3.072  us/op
RecordSerializationBench.deserializeRecords    10  avgt   10
9.011 ± 0.027  us/op
RecordSerializationBench.deserializeRecords   100  avgt   10   
33.206 ± 0.514  us/op
RecordSerializationBench.deserializeRecords  1000  avgt   10  
325.137 ± 0.969  us/op



...so keeping the correctly shaped adapted method handle in the 
per-serialization-session ObjectStreamClass instance [2] starts to make 
sense:



Benchmark    (length)  Mode  Cnt
Score   Error  Units
RecordSerializationBench.deserializeClasses    10  avgt   10
8.681 ± 0.155  us/op
RecordSerializationBench.deserializeClasses   100  avgt   10   
32.496 ± 0.087  us/op
RecordSerializationBench.deserializeClasses  1000  avgt   10  
279.014 ± 1.189  us/op
RecordSerializationBench.deserializeRecords    10  avgt   10
8.537 ± 0.032  us/op
RecordSerializationBench.deserializeRecords   100  avgt   10   
31.451 ± 0.083  us/op
RecordSerializationBench.deserializeRecords  1000  avgt   10  
250.854 ± 2.772  us/op



With that, more objects means advantage over classical classes instead 
of disadvantage.



[1] 
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java 



[2] 
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/ 




Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-21 Thread Peter Levart

Hi,


When re-running the benchmark [1] with different lengths of serialized 
arrays of records, I found that, compared to classical classes, lookup 
into the cache of adapted method handles starts to show when the length 
of array is larger (# of instances of same record type deserialized in 
single stream). Each record deserialized must lookup the method handle 
in a ConcurrentHashMap:



Benchmark    (length)  Mode  Cnt    Score   
Error  Units
RecordSerializationBench.deserializeClasses    10  avgt   10    8.088 ± 
0.013  us/op
RecordSerializationBench.deserializeClasses   100  avgt   10   32.171 ± 
0.324  us/op
RecordSerializationBench.deserializeClasses  1000  avgt   10  279.762 ± 
3.072  us/op
RecordSerializationBench.deserializeRecords    10  avgt   10    9.011 ± 
0.027  us/op
RecordSerializationBench.deserializeRecords   100  avgt   10   33.206 ± 
0.514  us/op
RecordSerializationBench.deserializeRecords  1000  avgt   10  325.137 ± 
0.969  us/op


...so keeping the correctly shaped adapted method handle in the 
per-serialization-session ObjectStreamClass instance [2] starts to make 
sense:



Benchmark    (length)  Mode  Cnt    Score   
Error  Units
RecordSerializationBench.deserializeClasses    10  avgt   10    8.681 ± 
0.155  us/op
RecordSerializationBench.deserializeClasses   100  avgt   10   32.496 ± 
0.087  us/op
RecordSerializationBench.deserializeClasses  1000  avgt   10  279.014 ± 
1.189  us/op
RecordSerializationBench.deserializeRecords    10  avgt   10    8.537 ± 
0.032  us/op
RecordSerializationBench.deserializeRecords   100  avgt   10   31.451 ± 
0.083  us/op
RecordSerializationBench.deserializeRecords  1000  avgt   10  250.854 ± 
2.772  us/op


With that, more objects means advantage over classical classes instead 
of disadvantage.



[1] 
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java


[2] 
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.06/



Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-21 Thread Peter Levart

Hi Chris,


Here's a small optimization over webrev.04 in the caching logic.We don't 
need a doubly-linked list to implement a FIFO. Just single 'next' link 
in each key is sufficient. I also changed initialCapacity of 
ConcurrentHashMap to 2 (which should be typical number of cached shapes 
per record class when there's more than 1 and requires 4 slots in 
internal table) and MAX_SIZE  of cache to 10 (which is the greatest 
number that requires 16 slots in internal table)...



http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.05/


Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-21 Thread Peter Levart

Hi Chris,


Good work on SerialByteStreamBuilder. A very useful tool.


What I noticed in the tests is that you usually follow the following 
pattern:


- create a record instance r

- serialize it to bytes and deserialize it back to instance deser1

- create a stream with SerialByteStreamBuilder and deserialize it to 
instance deser2


- compare deser1 with deser2


Now if there was a bug in deserialization logic it would be possible for 
both deser1 and deser2 to be equal but still wrong. Adding another 
assert to compare r with deser1 in each test could catch such bug.



In the following test:


 245 public void testArrays() throws Exception {
 246 out.println("\n---");
 247 {
 248 record IntArray(int[] ints, long[] longs) implements 
Serializable { }
 249 IntArray r = new IntArray(new int[] { 5, 4, 3,2, 1 }, 
new long[] { 9L });

 250 IntArray deser1 = deserialize(serialize(r));
 251
 252 byte[] builderBytes = SerialByteStreamBuilder
 253 .newBuilder(IntArray.class.getName())
 254 .addField("ints", String[].class, new int[] { 
5, 4, 3,2, 1 })
 255 .addField("longs", String[].class, new long[] 
{ 9L })

 256 .build();
 257
 258 IntArray deser2 = deserialize(builderBytes);
 259 assertEquals(deser2.ints(), deser1.ints());
 260 assertEquals(deser2.longs(), deser1.longs());
 261 }


...in lines 254, 255, you specify wrong types of fields (making the 
method generic with a type parameter could prevent such mistakes). I 
wonder why the test passes. In both variants (with or without the patch) 
we have this check in java.io.ObjectStreamClass.RecordSupport:



    for (int i = 0; i < fields.length; i++) {
    ObjectStreamField f = fields[i];
    String fName = f.getName();
    if (!fName.equals(pName))
    continue;

    Class fType = f.getField().getType();

    if (!pType.isAssignableFrom(fType))

    throw new InternalError(fName + " unassignable, pType:" + pType + 
", fType:" + fType);


... which is actually checking types of reflected 
java.lang.reflact.Field(s) with types of reflected 
java.lang.reflect.RecordComponent(s) that match in name which always 
match when record type is correctly constructed. And later when values 
are deserialized and assigned to constructor parameters, we have the 
right types of values in this "unusual" stream. In other words, changing 
the field type from one reference type to another is always a compatible 
change as far as stream format is concerned, deserialization then 
succeeds if the value in the stream is of correct type for the local 
type of field. Theoretically the change can be such that the reference 
types are unrelated, because such types have one common value: null. In 
this example the stream was unusual. The types were unrelated, but the 
stream had the correct type of value != null.



Anyway, maybe the test should also contain a case where the type of 
field does change compatibly (for example, Integer field serialized, 
deserialized into Number field), and one or more where it changes 
incompatibly. For the incompatible changes, I think some exceptions 
would be emitted by MethodHandle adapters that try to adapt incompatible 
reference types, other incompatible primitive field type changes would 
be caught by java.io.ObjectStreamClass.matchFields() method...



I think that the following method in SerialByteStreamBuilder is also 
incorrect:



 363 private void writeObjectFieldDesc(DataOutputStream out) 
throws IOException {
 364 for (Map.Entry entry : 
objectFields.entrySet()) {

 365 Class cl = entry.getKey().type();
 366 assert !cl.isPrimitive();
 367 // obj_typecode
 368 if (cl.isArray()) {
 369 out.writeByte('[');
 370 } else {
 371 out.writeByte('L');
 372 }
 373 writeUTF(out, entry.getKey().name());
 374 out.writeByte(TC_STRING);
 375 writeUTF(out, "L" + cl.getName().replace('.', '/') 
+ ";");

 376
 377 }
 378 }


Line 375 should probably be:


    writeUTF(out,
 (cl.isArray() ? cl.getName() : "L" + cl.getName() 
+ ";")

 .replace('.', '/'));


So I took the liberty and incorporated your work into my patch (both 
will be authors of the patch) and addressed above concerns. The modified 
test does not pass on JDK14. In testIncompatibleRefFieldTypeChange JDK 
14 throws ClassCastException (because it is thrown from the binding of 
arguments to method handle) while with the patched JDK16 it throws 
InvalidObjectException, because ClassCastException is thrown from 
invokeExact, which is then 

Re: RFR: 8247532: Records deserialization is slow

2020-06-21 Thread Peter Levart



On 6/17/20 4:47 PM, fo...@univ-mlv.fr wrote:

- Mail original -

De: "Peter Levart" 
... TLDR;
So for each record type
(Class) there can be several distinct
ObjectStreamClasses deserialized in the same VM that were produced by
serializing different versions of record types in different VMs...

Ok, i see, you want to cache all permutations not only the one corresponding to 
the current record order.

Rémi



Right. So while the most prevalent shape of stream will be the one that 
is produced by the same version of class as is currently used in the 
local VM, there can be situations where this is not true. For example, 
you upgrade some service with new version of class, but want to support 
older clients as well as new. If older clients suddenly experienced a 
drop of deserialization performance, it would not be nice.



Regards, Peter




On 6/17/20 1:06 AM, Remi Forax wrote:

Hi Peter,
is their a reason to not use a ClassValue using the record class
as key instead of the more complex keys you are proposing ?

Rémi


Re: RFR: 8247532: Records deserialization is slow

2020-06-17 Thread forax
- Mail original -
> De: "Peter Levart" 
> À: "Remi Forax" 
> Cc: "core-libs-dev" , "Chris Hegarty" 
> 
> Envoyé: Mercredi 17 Juin 2020 07:37:16
> Objet: Re: RFR: 8247532: Records deserialization is slow

> Hi Remi,
> 
> 
> The keys used are based on the ordered names and types of "fields" in
> the ObjectStreamClass that has been deserialized as part of the object
> stream for the record "type" being deserialized. So for each record type
> (Class) there can be several distinct
> ObjectStreamClasses deserialized in the same VM that were produced by
> serializing different versions of record types in different VMs or
> ClassLoader(s) and that may vary in the names and/or types and/or order
> of fields. And since the MethodHandle chain of transformations that is
> being used for a particular ObjectStreamClass is dependent on the
> ordered names and types of ObjectStreamField(s) of deserialized
> ObjectStreamClass, the key is based on that too.
> 
> You may ask why not simply add a MethodHandle field to the
> ObjectStreamClass instance that is deserialized and therefore unique?
> Well, such ObjectStreamClass only lasts for one "session" of
> ObjectStream deserialization, so such caching wouldn't be efficient. But
> common parts of that ObjectStreamClass that are only dependent on the
> current local-VM type are cached in a special instance of
> ObjectStreamClass and that's where I put a cache of deserializaiton
> constructors in the form of a Map.

Ok, i see, you want to cache all permutations not only the one corresponding to 
the current record order.

> 
> 
> Regards, Peter

Rémi

> 
> 
> On 6/17/20 1:06 AM, Remi Forax wrote:
>> Hi Peter,
>> is their a reason to not use a ClassValue using the record 
>> class
>> as key instead of the more complex keys you are proposing ?
>>
>> Rémi
>>
>> - Mail original -
>>> De: "Chris Hegarty" 
>>> À: "Peter Levart" 
>>> Cc: "core-libs-dev" 
>>> Envoyé: Mardi 16 Juin 2020 17:15:03
>>> Objet: Re: RFR: 8247532: Records deserialization is slow
>>> Hi Peter,
>>>
>>>> On 14 Jun 2020, at 17:28, Peter Levart  wrote:
>>>>
>>>> ...
>>>>
>>>> [2]
>>>> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.01/
>>>>
>>> I think that this is very good. It’s clever to build a chain of method 
>>> handles
>>> that can be invoked with the stream field values. This is a nice separation
>>> between the shape of the data and the actual stream data.
>>>
>>> The caching is on a per-stream-field shape basis, which should be 
>>> consistent in
>>> the common case, but of course is not always guaranteed to be the case, 
>>> hence
>>> the need for the cache. I think that this should be fine, since the
>>> ObjectStreamClass ( that holds the cache ) is already itself cached as a 
>>> weak
>>> reference. But I did wonder if the size of this new cache should be limited.
>>> Probably not worth the complexity unless it is an obvious issue.
>>>
>>> All the serailizable records tests pass successfully with your patch. Good. 
>>> I
>>> did however notice that there is just a single test, DefaultValuesTest, that
>>> exercises different stream shapes for the same class in the stream. Would be
>>> good to expand coverage in this area, but of course some lower-level
>>> test-specific building blocks will be needed help build the specially 
>>> crafted
>>> byte streams - I can help with this.
>>>
>>> Overall I think that this change is good.
>>>
> >> -Chris.


Re: RFR: 8247532: Records deserialization is slow

2020-06-17 Thread Chris Hegarty
Peter,

> On 17 Jun 2020, at 09:23, Chris Hegarty  wrote:
> 
>> On 17 Jun 2020, at 06:44, Peter Levart  wrote:
 ..
>>> If there is a way for a test to compile several versions of the same 
>>> (record) class and then produce byte streams with it, then we could 
>>> simulate various class-evolution cases (field missing, surplus field, 
>>> change of field type, etc...) without crafting the bytestream by hand. WDYT?
>> 
>> I have a better idea. The test could contain several different classes with 
>> different names that would be used to generated variations of serialized 
>> stream. Such streams could then be "patched" so they would contain the 
>> common target class name and then used to (attempt to) deserialize the 
>> target class. I'll try to devise such test…
>> 
> I think that could work. I’ll mock something up too, just for comparison 
> purposes.


Here is an initial version at a test that can be expanded to cover more of the 
stream-field shapes of serializable records.

I quickly backed away from generating the byte streams using OOS alone, since 
it enforces strict ordering of the fields in the stream, beyond that of 
primitives followed by non-primitives. And I want to be able to do things like, 
for example, record Point(int x, int y) - stream of x:1 y:2, or y:2 x:1 - which 
will verify the nominality aspect of the caching implementation. So I opted for 
a basic byte-stream-builder approach.

https://cr.openjdk.java.net/~chegar/serialFieldTest_webrev/test/jdk/java/io/Serializable/records/DifferentStreamFieldsTest.java.html

I also would like a builder of serial byte streams anyway, as it will be useful 
for things beyond this change. Maybe it could even be expanded upon and used as 
a test library, at some future point. The above implementation can probably be 
easily broken if pushed hard with larger graphs of objects, but it should be 
good enough for what we need here.

-Chris.



Re: RFR: 8247532: Records deserialization is slow

2020-06-17 Thread Chris Hegarty



> On 17 Jun 2020, at 06:44, Peter Levart  wrote:
>>> ..
>> If there is a way for a test to compile several versions of the same 
>> (record) class and then produce byte streams with it, then we could simulate 
>> various class-evolution cases (field missing, surplus field, change of field 
>> type, etc...) without crafting the bytestream by hand. WDYT?
> 
> I have a better idea. The test could contain several different classes with 
> different names that would be used to generated variations of serialized 
> stream. Such streams could then be "patched" so they would contain the common 
> target class name and then used to (attempt to) deserialize the target class. 
> I'll try to devise such test…
> 
I think that could work. I’ll mock something up too, just for comparison 
purposes.

-Chris.



Re: RFR: 8247532: Records deserialization is slow

2020-06-17 Thread Peter Levart



On 6/17/20 8:08 AM, Peter Levart wrote:


On 6/16/20 5:15 PM, Chris Hegarty wrote:
The caching is on a per-stream-field shape basis, which should be 
consistent in the common case, but of course is not always guaranteed 
to be the case, hence the need for the cache. I think that this 
should be fine, since the ObjectStreamClass ( that holds the cache ) 
is already itself cached as a weak reference. But I did wonder if the 
size of this new cache should be limited. Probably not worth the 
complexity unless it is an obvious issue.


I don't think there will normally be many different shapes of the same 
class deserialized by a single VM. Each shape means that a different 
version of that class must have existed to serialize it. There could 
be deliberate "forged" streams trying to inflate the cache. Are you 
worried about that? In that case I can add logic to limit the number 
of different shapes kept with a simple LRA (Least Recently Added) 
strategy that would not hurt access performance.



Regards, Peter

So, here's an attempt to limit the number of different shapes cached per 
class which doesn't affect the fast-path performance:



http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.03/


Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-17 Thread Chris Hegarty



> On 17 Jun 2020, at 07:08, Peter Levart  wrote:
> 
> 
> On 6/16/20 5:15 PM, Chris Hegarty wrote:
>> The caching is on a per-stream-field shape basis, which should be consistent 
>> in the common case, but of course is not always guaranteed to be the case, 
>> hence the need for the cache. I think that this should be fine, since the 
>> ObjectStreamClass ( that holds the cache ) is already itself cached as a 
>> weak reference. But I did wonder if the size of this new cache should be 
>> limited. Probably not worth the complexity unless it is an obvious issue.
> 
> I don't think there will normally be many different shapes of the same class 
> deserialized by a single VM. Each shape means that a different version of 
> that class must have existed to serialize it.

Right.

> There could be deliberate "forged" streams trying to inflate the cache. Are 
> you worried about that?

Yes.

> In that case I can add logic to limit the number of different shapes kept 
> with a simple LRA (Least Recently Added) strategy that would not hurt access 
> performance.

That would be great.

-Chris.

Re: RFR: 8247532: Records deserialization is slow

2020-06-17 Thread Peter Levart



On 6/16/20 5:15 PM, Chris Hegarty wrote:

The caching is on a per-stream-field shape basis, which should be consistent in 
the common case, but of course is not always guaranteed to be the case, hence 
the need for the cache. I think that this should be fine, since the 
ObjectStreamClass ( that holds the cache ) is already itself cached as a weak 
reference. But I did wonder if the size of this new cache should be limited. 
Probably not worth the complexity unless it is an obvious issue.


I don't think there will normally be many different shapes of the same 
class deserialized by a single VM. Each shape means that a different 
version of that class must have existed to serialize it. There could be 
deliberate "forged" streams trying to inflate the cache. Are you worried 
about that? In that case I can add logic to limit the number of 
different shapes kept with a simple LRA (Least Recently Added) strategy 
that would not hurt access performance.



Regards, Peter





Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Peter Levart

Hi Chris,


On 6/17/20 12:31 AM, Peter Levart wrote:
I did however notice that there is just a single test, 
DefaultValuesTest, that exercises different stream shapes for the 
same class in the stream. Would be good to expand coverage in this 
area, but of course some lower-level test-specific building blocks 
will be needed help build the specially crafted byte streams - I can 
help with this.


Overall I think that this change is good.

-Chris.

If there is a way for a test to compile several versions of the same 
(record) class and then produce byte streams with it, then we could 
simulate various class-evolution cases (field missing, surplus field, 
change of field type, etc...) without crafting the bytestream by hand. 
WDYT? 



I have a better idea. The test could contain several different classes 
with different names that would be used to generated variations of 
serialized stream. Such streams could then be "patched" so they would 
contain the common target class name and then used to (attempt to) 
deserialize the target class. I'll try to devise such test...



Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Peter Levart

Hi Remi,


The keys used are based on the ordered names and types of "fields" in 
the ObjectStreamClass that has been deserialized as part of the object 
stream for the record "type" being deserialized. So for each record type 
(Class) there can be several distinct 
ObjectStreamClasses deserialized in the same VM that were produced by 
serializing different versions of record types in different VMs or 
ClassLoader(s) and that may vary in the names and/or types and/or order 
of fields. And since the MethodHandle chain of transformations that is 
being used for a particular ObjectStreamClass is dependent on the 
ordered names and types of ObjectStreamField(s) of deserialized 
ObjectStreamClass, the key is based on that too.


You may ask why not simply add a MethodHandle field to the 
ObjectStreamClass instance that is deserialized and therefore unique? 
Well, such ObjectStreamClass only lasts for one "session" of 
ObjectStream deserialization, so such caching wouldn't be efficient. But 
common parts of that ObjectStreamClass that are only dependent on the 
current local-VM type are cached in a special instance of 
ObjectStreamClass and that's where I put a cache of deserializaiton 
constructors in the form of a Map.



Regards, Peter


On 6/17/20 1:06 AM, Remi Forax wrote:

Hi Peter,
is their a reason to not use a ClassValue using the record class 
as key instead of the more complex keys you are proposing ?

Rémi

- Mail original -

De: "Chris Hegarty" 
À: "Peter Levart" 
Cc: "core-libs-dev" 
Envoyé: Mardi 16 Juin 2020 17:15:03
Objet: Re: RFR: 8247532: Records deserialization is slow
Hi Peter,


On 14 Jun 2020, at 17:28, Peter Levart  wrote:

...

[2]
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.01/


I think that this is very good. It’s clever to build a chain of method handles
that can be invoked with the stream field values. This is a nice separation
between the shape of the data and the actual stream data.

The caching is on a per-stream-field shape basis, which should be consistent in
the common case, but of course is not always guaranteed to be the case, hence
the need for the cache. I think that this should be fine, since the
ObjectStreamClass ( that holds the cache ) is already itself cached as a weak
reference. But I did wonder if the size of this new cache should be limited.
Probably not worth the complexity unless it is an obvious issue.

All the serailizable records tests pass successfully with your patch. Good. I
did however notice that there is just a single test, DefaultValuesTest, that
exercises different stream shapes for the same class in the stream. Would be
good to expand coverage in this area, but of course some lower-level
test-specific building blocks will be needed help build the specially crafted
byte streams - I can help with this.

Overall I think that this change is good.

-Chris.


Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Paul Sandoz



> On Jun 16, 2020, at 3:36 PM, Peter Levart  wrote:
> 
> Yes, or even better: MethodHandles.empty(MethodType.methodType(pClass, 
> byte[].class, Object[].class)) which was suggested by Johanes Kuhn as here:
> 
> 
> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.02/
> 

Ah yes much better.
Paul.



Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Remi Forax
Hi Peter,
is their a reason to not use a ClassValue using the record class 
as key instead of the more complex keys you are proposing ?

Rémi

- Mail original -
> De: "Chris Hegarty" 
> À: "Peter Levart" 
> Cc: "core-libs-dev" 
> Envoyé: Mardi 16 Juin 2020 17:15:03
> Objet: Re: RFR: 8247532: Records deserialization is slow

> Hi Peter,
> 
>> On 14 Jun 2020, at 17:28, Peter Levart  wrote:
>> 
>> ...
>> 
>> [2]
>> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.01/
>> 
> I think that this is very good. It’s clever to build a chain of method handles
> that can be invoked with the stream field values. This is a nice separation
> between the shape of the data and the actual stream data.
> 
> The caching is on a per-stream-field shape basis, which should be consistent 
> in
> the common case, but of course is not always guaranteed to be the case, hence
> the need for the cache. I think that this should be fine, since the
> ObjectStreamClass ( that holds the cache ) is already itself cached as a weak
> reference. But I did wonder if the size of this new cache should be limited.
> Probably not worth the complexity unless it is an obvious issue.
> 
> All the serailizable records tests pass successfully with your patch. Good. I
> did however notice that there is just a single test, DefaultValuesTest, that
> exercises different stream shapes for the same class in the stream. Would be
> good to expand coverage in this area, but of course some lower-level
> test-specific building blocks will be needed help build the specially crafted
> byte streams - I can help with this.
> 
> Overall I think that this change is good.
> 
> -Chris.


Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Peter Levart

Hi Paul,

On 6/16/20 5:50 PM, Paul Sandoz wrote:

Hi Peter,

  199 private Map deserializationCtrs;

Change to be either ConcurrentMap or ConcurrentHashMap, thereby making it 
clearer when using.



Sure. Will do that.





2679 private static MethodHandle defaultValueExtractorFor(Class 
pType) {

Can the implementation use MethodHandles,zero? e.g.

   return MethodHandles.dropArguments(MethodHandles.zero(pType), 0, 
byte[].class, Object[].class);
  
Paul.



Yes, or even better: MethodHandles.empty(MethodType.methodType(pClass, 
byte[].class, Object[].class)) which was suggested by Johanes Kuhn as here:



http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.02/


Regards, Peter





On Jun 16, 2020, at 8:15 AM, Chris Hegarty  wrote:

Hi Peter,


On 14 Jun 2020, at 17:28, Peter Levart  wrote:

...

[2] 
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.01/


I think that this is very good. It’s clever to build a chain of method handles 
that can be invoked with the stream field values. This is a nice separation 
between the shape of the data and the actual stream data.

The caching is on a per-stream-field shape basis, which should be consistent in 
the common case, but of course is not always guaranteed to be the case, hence 
the need for the cache. I think that this should be fine, since the 
ObjectStreamClass ( that holds the cache ) is already itself cached as a weak 
reference. But I did wonder if the size of this new cache should be limited. 
Probably not worth the complexity unless it is an obvious issue.

All the serailizable records tests pass successfully with your patch. Good. I 
did however notice that there is just a single test, DefaultValuesTest, that 
exercises different stream shapes for the same class in the stream. Would be 
good to expand coverage in this area, but of course some lower-level 
test-specific building blocks will be needed help build the specially crafted 
byte streams - I can help with this.

Overall I think that this change is good.

-Chris.



Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Peter Levart

Hi Chris,

On 6/16/20 5:15 PM, Chris Hegarty wrote:

Hi Peter,


On 14 Jun 2020, at 17:28, Peter Levart  wrote:

...

[2] 
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.01/


I think that this is very good. It’s clever to build a chain of method handles 
that can be invoked with the stream field values. This is a nice separation 
between the shape of the data and the actual stream data.

The caching is on a per-stream-field shape basis, which should be consistent in 
the common case, but of course is not always guaranteed to be the case, hence 
the need for the cache. I think that this should be fine, since the 
ObjectStreamClass ( that holds the cache ) is already itself cached as a weak 
reference. But I did wonder if the size of this new cache should be limited. 
Probably not worth the complexity unless it is an obvious issue.

All the serailizable records tests pass successfully with your patch. Good. I 
did however notice that there is just a single test, DefaultValuesTest, that 
exercises different stream shapes for the same class in the stream. Would be 
good to expand coverage in this area, but of course some lower-level 
test-specific building blocks will be needed help build the specially crafted 
byte streams - I can help with this.

Overall I think that this change is good.

-Chris.

If there is a way for a test to compile several versions of the same 
(record) class and then produce byte streams with it, then we could 
simulate various class-evolution cases (field missing, surplus field, 
change of field type, etc...) without crafting the bytestream by hand. WDYT?



Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Paul Sandoz
Hi Peter,

 199 private Map deserializationCtrs;

Change to be either ConcurrentMap or ConcurrentHashMap, thereby making it 
clearer when using.


2679 private static MethodHandle defaultValueExtractorFor(Class 
pType) {

Can the implementation use MethodHandles,zero? e.g.

  return MethodHandles.dropArguments(MethodHandles.zero(pType), 0, 
byte[].class, Object[].class);
 
Paul.

> On Jun 16, 2020, at 8:15 AM, Chris Hegarty  wrote:
> 
> Hi Peter,
> 
>> On 14 Jun 2020, at 17:28, Peter Levart  wrote:
>> 
>> ...
>> 
>> [2] 
>> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.01/
>> 
> I think that this is very good. It’s clever to build a chain of method 
> handles that can be invoked with the stream field values. This is a nice 
> separation between the shape of the data and the actual stream data.
> 
> The caching is on a per-stream-field shape basis, which should be consistent 
> in the common case, but of course is not always guaranteed to be the case, 
> hence the need for the cache. I think that this should be fine, since the 
> ObjectStreamClass ( that holds the cache ) is already itself cached as a weak 
> reference. But I did wonder if the size of this new cache should be limited. 
> Probably not worth the complexity unless it is an obvious issue.
> 
> All the serailizable records tests pass successfully with your patch. Good. I 
> did however notice that there is just a single test, DefaultValuesTest, that 
> exercises different stream shapes for the same class in the stream. Would be 
> good to expand coverage in this area, but of course some lower-level 
> test-specific building blocks will be needed help build the specially crafted 
> byte streams - I can help with this.
> 
> Overall I think that this change is good.
> 
> -Chris.
> 



Re: RFR: 8247532: Records deserialization is slow

2020-06-16 Thread Chris Hegarty
Hi Peter,

> On 14 Jun 2020, at 17:28, Peter Levart  wrote:
> 
> ...
> 
> [2] 
> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.01/
> 
I think that this is very good. It’s clever to build a chain of method handles 
that can be invoked with the stream field values. This is a nice separation 
between the shape of the data and the actual stream data.

The caching is on a per-stream-field shape basis, which should be consistent in 
the common case, but of course is not always guaranteed to be the case, hence 
the need for the cache. I think that this should be fine, since the 
ObjectStreamClass ( that holds the cache ) is already itself cached as a weak 
reference. But I did wonder if the size of this new cache should be limited. 
Probably not worth the complexity unless it is an obvious issue.

All the serailizable records tests pass successfully with your patch. Good. I 
did however notice that there is just a single test, DefaultValuesTest, that 
exercises different stream shapes for the same class in the stream. Would be 
good to expand coverage in this area, but of course some lower-level 
test-specific building blocks will be needed help build the specially crafted 
byte streams - I can help with this.

Overall I think that this change is good.

-Chris.



Re: RFR: 8247532: Records deserialization is slow

2020-06-15 Thread Chris Hegarty



> On 15 Jun 2020, at 10:06, Peter Levart  wrote:
> 
> Hi Chris,
> 
> Maybe I stamped "Enchancement" just because it does not represent a logical 
> bug, but 16x regression compared to the same feature with classes could be 
> characterized as performance "bug", right?

That would be the categorisation that I would use.

-Chris.

> 
> Peter
> 
> On 6/15/20 10:37 AM, Chris Hegarty wrote:
>> Hi Peter,
>> 
>> Thank you for doing this, it is really appreciated. I’m going to take this 
>> patch and run it in my local environment, after which I’ll post a reply here.
>> 
>> > So WDYT? Since this is still a preview feature in JDK15, is it possible to 
>> > squeeze it into JDK15?
>> 
>> I think that such a change is a reasonable candidate for consideration in 
>> RDP 1. I see that the issue in JIRA is currently an Enhancement. Is it 
>> really an enhancement? If so, then it can follow the LERP [1]. Otherwise, if 
>> a regular bug between P1-P3 (inclusive), it can be integrated when reviewed 
>> and ready.
>> 
>> -Chris.
>> 
>> [1] https://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process 
>> 
>> 
>>> On 14 Jun 2020, at 22:34, Peter Levart >> > wrote:
>>> 
>>> So, here's new webrev incorporating Johannes' suggestion:
>>> 
>>> 
>>> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.02/
>>>  
>>> 
>>> 
>> 
> 



Re: RFR: 8247532: Records deserialization is slow

2020-06-15 Thread Peter Levart

Hi Chris,

Maybe I stamped "Enchancement" just because it does not represent a 
logical bug, but 16x regression compared to the same feature with 
classes could be characterized as performance "bug", right?


Peter

On 6/15/20 10:37 AM, Chris Hegarty wrote:

Hi Peter,

Thank you for doing this, it is really appreciated. I’m going to take 
this patch and run it in my local environment, after which I’ll post a 
reply here.


> So WDYT? Since this is still a preview feature in JDK15, is it 
possible to squeeze it into JDK15?


I think that such a change is a reasonable candidate for consideration 
in RDP 1. I see that the issue in JIRA is currently an Enhancement. Is 
it really an enhancement? If so, then it can follow the LERP [1]. 
Otherwise, if a regular bug between P1-P3 (inclusive), it can be 
integrated when reviewed and ready.


-Chris.

[1] https://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process

On 14 Jun 2020, at 22:34, Peter Levart > wrote:


So, here's new webrev incorporating Johannes' suggestion:


http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.02/







Re: RFR: 8247532: Records deserialization is slow

2020-06-15 Thread Chris Hegarty
Hi Peter,

Thank you for doing this, it is really appreciated. I’m going to take this 
patch and run it in my local environment, after which I’ll post a reply here.

> So WDYT? Since this is still a preview feature in JDK15, is it possible to 
> squeeze it into JDK15?

I think that such a change is a reasonable candidate for consideration in RDP 
1. I see that the issue in JIRA is currently an Enhancement. Is it really an 
enhancement? If so, then it can follow the LERP [1]. Otherwise, if a regular 
bug between P1-P3 (inclusive), it can be integrated when reviewed and ready.

-Chris.

[1] https://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process

> On 14 Jun 2020, at 22:34, Peter Levart  wrote:
> 
> So, here's new webrev incorporating Johannes' suggestion:
> 
> 
> http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.02/
> 



Re: RFR: 8247532: Records deserialization is slow

2020-06-14 Thread Peter Levart

So, here's new webrev incorporating Johannes' suggestion:


http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.02/


Peter

On 6/14/20 11:17 PM, Peter Levart wrote:


On 6/14/20 7:03 PM, Johannes Kuhn wrote:
Small suggestion: RecordSupport.defaultValueExtractorFor could be 
written as:


    return MethodHandles.empty(MethodType.methodType(pClass, 
byte[].class, Object[].class));


It could then be inlined.

- Johannes 



Thanks, Johannes. There is also a MethodHandles.zero(Class type) 
method as I found out.



Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-14 Thread Peter Levart



On 6/14/20 7:03 PM, Johannes Kuhn wrote:
Small suggestion: RecordSupport.defaultValueExtractorFor could be 
written as:


    return MethodHandles.empty(MethodType.methodType(pClass, 
byte[].class, Object[].class));


It could then be inlined.

- Johannes 



Thanks, Johannes. There is also a MethodHandles.zero(Class type) 
method as I found out.



Regards, Peter




Re: RFR: 8247532: Records deserialization is slow

2020-06-14 Thread Johannes Kuhn

On 14-Jun-20 18:28, Peter Levart wrote:

Hi,


I noticed that deserializing records (new preview Java feature in 
JDK14 and JDK15) is slow compared to equivalent classical classes. I 
created a JMH benchmark [1] to se how it compares (ran it on JDK14):



Benchmark Mode  Cnt   Score    Error   Units
RecordSerializationBench.deserializeClasses avgt   10  31.911 
±  0.218   us/op
RecordSerializationBench.deserializeClasses:·gc.alloc.rate avgt   
10 815.106 ±  5.545  MB/sec
RecordSerializationBench.deserializeClasses:·gc.alloc.rate.norm avgt   
10   40921.903 ±  0.615    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space 
avgt   10 839.522 ±    191.195  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space.norm 
avgt   10   42153.661 ±   9682.799    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space 
avgt   10   0.117 ±  0.492  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space.norm 
avgt   10   5.835 ± 24.447    B/op
RecordSerializationBench.deserializeClasses:·gc.count avgt   10  
21.000   counts
RecordSerializationBench.deserializeClasses:·gc.time avgt   10  
17.000   ms
RecordSerializationBench.deserializeRecords avgt   10 531.333 
±  3.094   us/op
RecordSerializationBench.deserializeRecords:·gc.alloc.rate avgt   
10 346.511 ±  1.997  MB/sec
RecordSerializationBench.deserializeRecords:·gc.alloc.rate.norm avgt   
10  289637.193 ±  6.894    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space 
avgt   10 359.773 ±    191.116  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space.norm 
avgt   10  300657.838 ± 159724.346    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space 
avgt   10   0.007 ±  0.012  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space.norm 
avgt   10   6.020 ±  9.910    B/op
RecordSerializationBench.deserializeRecords:·gc.count avgt   10   
9.000   counts
RecordSerializationBench.deserializeRecords:·gc.time avgt   10  
14.000   ms



...not only it is it about 16x slower, it also produces 7x garbage. I 
checked the code and it is not very optimal. It matches the record 
component names with object stream fields in O(n^2) way. It uses 
method handles but binds arguments of canonical constructor each time 
an instance of record is constructed. So I tried to optimize it [2] 
and with that patch on top of JDK16 the benchmark produces the 
following results:



Benchmark Mode  Cnt  Score   Error   Units
RecordSerializationBench.deserializeClasses avgt   10 31.049 ± 
0.235   us/op
RecordSerializationBench.deserializeClasses:·gc.alloc.rate avgt   
10    837.614 ± 6.326  MB/sec
RecordSerializationBench.deserializeClasses:·gc.alloc.rate.norm avgt   
10  40921.931 ± 0.666    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space 
avgt   10    867.743 ±   251.373  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space.norm 
avgt   10  42405.532 ± 12403.301    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space 
avgt   10  0.126 ± 0.478  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space.norm 
avgt   10  6.113 ±    23.268    B/op
RecordSerializationBench.deserializeClasses:·gc.count avgt   10 
22.000  counts
RecordSerializationBench.deserializeClasses:·gc.time avgt   10 
19.000  ms
RecordSerializationBench.deserializeRecords avgt   10 33.588 ± 
0.394   us/op
RecordSerializationBench.deserializeRecords:·gc.alloc.rate avgt   
10    500.033 ± 5.871  MB/sec
RecordSerializationBench.deserializeRecords:·gc.alloc.rate.norm avgt   
10  26425.293 ± 0.759    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space 
avgt   10    512.772 ±   288.112  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space.norm 
avgt   10  27090.499 ± 15175.280    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space 
avgt   10  0.134 ± 0.496  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space.norm 
avgt   10  7.128 ±    26.526    B/op
RecordSerializationBench.deserializeRecords:·gc.count avgt   10 
13.000  counts
RecordSerializationBench.deserializeRecords:·gc.time avgt   10 
17.000  ms



...so here the speed is comparable and it even produces less garbage.


I created an issue [3].

So WDYT? Since this is still a preview feature in JDK15, is it 
possible to squeeze it into JDK15?


Regards, Peter


[1] 
http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/RecordSerializationBench.java


[2] 

RFR: 8247532: Records deserialization is slow

2020-06-14 Thread Peter Levart

Hi,


I noticed that deserializing records (new preview Java feature in JDK14 
and JDK15) is slow compared to equivalent classical classes. I created a 
JMH benchmark [1] to se how it compares (ran it on JDK14):



Benchmark 
Mode  Cnt   Score    Error   Units
RecordSerializationBench.deserializeClasses   
avgt   10  31.911 ±  0.218   us/op
RecordSerializationBench.deserializeClasses:·gc.alloc.rate    
avgt   10 815.106 ±  5.545  MB/sec
RecordSerializationBench.deserializeClasses:·gc.alloc.rate.norm   
avgt   10   40921.903 ±  0.615    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space   
avgt   10 839.522 ±    191.195  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space.norm  
avgt   10   42153.661 ±   9682.799    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space   
avgt   10   0.117 ±  0.492  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space.norm  
avgt   10   5.835 ± 24.447    B/op
RecordSerializationBench.deserializeClasses:·gc.count 
avgt   10  21.000   counts
RecordSerializationBench.deserializeClasses:·gc.time  
avgt   10  17.000   ms
RecordSerializationBench.deserializeRecords   
avgt   10 531.333 ±  3.094   us/op
RecordSerializationBench.deserializeRecords:·gc.alloc.rate    
avgt   10 346.511 ±  1.997  MB/sec
RecordSerializationBench.deserializeRecords:·gc.alloc.rate.norm   
avgt   10  289637.193 ±  6.894    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space   
avgt   10 359.773 ±    191.116  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space.norm  
avgt   10  300657.838 ± 159724.346    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space   
avgt   10   0.007 ±  0.012  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space.norm  
avgt   10   6.020 ±  9.910    B/op
RecordSerializationBench.deserializeRecords:·gc.count 
avgt   10   9.000   counts
RecordSerializationBench.deserializeRecords:·gc.time  
avgt   10  14.000   ms


...not only it is it about 16x slower, it also produces 7x garbage. I 
checked the code and it is not very optimal. It matches the record 
component names with object stream fields in O(n^2) way. It uses method 
handles but binds arguments of canonical constructor each time an 
instance of record is constructed. So I tried to optimize it [2] and 
with that patch on top of JDK16 the benchmark produces the following 
results:



Benchmark 
Mode  Cnt  Score   Error   Units
RecordSerializationBench.deserializeClasses   
avgt   10 31.049 ± 0.235   us/op
RecordSerializationBench.deserializeClasses:·gc.alloc.rate    
avgt   10    837.614 ± 6.326  MB/sec
RecordSerializationBench.deserializeClasses:·gc.alloc.rate.norm   
avgt   10  40921.931 ± 0.666    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space   
avgt   10    867.743 ±   251.373  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Eden_Space.norm  
avgt   10  42405.532 ± 12403.301    B/op
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space   
avgt   10  0.126 ± 0.478  MB/sec
RecordSerializationBench.deserializeClasses:·gc.churn.G1_Survivor_Space.norm  
avgt   10  6.113 ±    23.268    B/op
RecordSerializationBench.deserializeClasses:·gc.count 
avgt   10 22.000  counts
RecordSerializationBench.deserializeClasses:·gc.time  
avgt   10 19.000  ms
RecordSerializationBench.deserializeRecords   
avgt   10 33.588 ± 0.394   us/op
RecordSerializationBench.deserializeRecords:·gc.alloc.rate    
avgt   10    500.033 ± 5.871  MB/sec
RecordSerializationBench.deserializeRecords:·gc.alloc.rate.norm   
avgt   10  26425.293 ± 0.759    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space   
avgt   10    512.772 ±   288.112  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Eden_Space.norm  
avgt   10  27090.499 ± 15175.280    B/op
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space   
avgt   10  0.134 ± 0.496  MB/sec
RecordSerializationBench.deserializeRecords:·gc.churn.G1_Survivor_Space.norm  
avgt   10  7.128 ±    26.526