eejbyfeldt commented on PR #38428:
URL: https://github.com/apache/spark/pull/38428#issuecomment-1410514876

   > Wait, am I reading this wrong, or is the new approach slower? 
useIterator=true cases seem to take a tiny bit longer. If that's right, what's 
the benefit?
   
   I think you are reading it wrong. Yes comparing `useIterator=true` to 
`useIterator=false` it probably a bit slower. (This could be that the 
useIterator=false use a correctly sized array we collecting the data or just 
that there is overhead in using the iterator) But that is not the point of the 
comparison/benchmark. 
   
   The goal of this PR was to avoid throwing exceptions to indicate of stream 
as creating the exception (and adding the stacktrace) is fairly expensive for 
small streams. (I think having EOFException is also problematic for correctness 
as we might misinterpret an EOFException on e.g a truncated buffer and silently 
ignore missing data. But the initial review said that we wanted to keep the 
existing `DeserializationStream` so this behavior is kept here). The point of 
the benchmark was that adding the benchmark to a PR prior to this change had 
results like
   
   ```
   penJDK 64-Bit Server VM 17.0.5+8 on Linux 5.15.0-1031-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Benchmark of kryo asIterator on deserialization stream:        Best Time(ms) 
  Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------------------------
   Colletion of int with 1 elements, useIterator: true                      113 
           114           1          0.1       11274.8       1.0X
   Colletion of int with 10 elements, useIterator: true                     125 
           126           0          0.1       12547.7       0.9X
   Colletion of int with 100 elements, useIterator: true                    243 
           244           1          0.0       24324.0       0.5X
   Colletion of string with 1 elements, useIterator: true                   112 
           112           0          0.1       11207.8       1.0X
   Colletion of string with 10 elements, useIterator: true                  135 
           135           0          0.1       13459.3       0.8X
   Colletion of string with 100 elements, useIterator: true                 350 
           351           1          0.0       35010.5       0.3X
   Colletion of Array[int] with 1 elements, useIterator: true               112 
           113           1          0.1       11235.7       1.0X
   Colletion of Array[int] with 10 elements, useIterator: true              132 
           132           1          0.1       13167.5       0.9X
   Colletion of Array[int] with 100 elements, useIterator: true             334 
           334           1          0.0       33351.4       0.3X
   Colletion of int with 1 elements, useIterator: false                      14 
            14           0          0.7        1364.5       8.3X
   Colletion of int with 10 elements, useIterator: false                     24 
            25           0          0.4        2414.8       4.7X
   Colletion of int with 100 elements, useIterator: false                   132 
           132           0          0.1       13171.3       0.9X
   Colletion of string with 1 elements, useIterator: false                   15 
            15           0          0.7        1504.1       7.5X
   Colletion of string with 10 elements, useIterator: false                  36 
            37           1          0.3        3633.5       3.1X
   Colletion of string with 100 elements, useIterator: false                242 
           242           1          0.0       24161.7       0.5X
   Colletion of Array[int] with 1 elements, useIterator: false               15 
            15           0          0.7        1457.2       7.7X
   Colletion of Array[int] with 10 elements, useIterator: false              32 
            33           0          0.3        3222.4       3.5X
   Colletion of Array[int] with 100 elements, useIterator: false            219 
           219           0          0.0       21873.1       0.5X
   ```
   
   And in this branch:
   ```
   OpenJDK 64-Bit Server VM 17.0.5+8 on Linux 5.15.0-1031-azure
   Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
   Benchmark of kryo asIterator on deserialization stream:        Best Time(ms) 
  Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------------------------
   Colletion of int with 1 elements, useIterator: true                       22 
            25           2          0.5        2160.1       1.0X
   Colletion of int with 10 elements, useIterator: true                      37 
            41           2          0.3        3742.8       0.6X
   Colletion of int with 100 elements, useIterator: true                    192 
           199           5          0.1       19184.5       0.1X
   Colletion of string with 1 elements, useIterator: true                    24 
            26           1          0.4        2384.5       0.9X
   Colletion of string with 10 elements, useIterator: true                   51 
            58           3          0.2        5092.8       0.4X
   Colletion of string with 100 elements, useIterator: true                 338 
           355          12          0.0       33833.5       0.1X
   Colletion of Array[int] with 1 elements, useIterator: true                24 
            28           5          0.4        2372.5       0.9X
   Colletion of Array[int] with 10 elements, useIterator: true               48 
            53           3          0.2        4841.6       0.4X
   Colletion of Array[int] with 100 elements, useIterator: true             341 
           356           8          0.0       34118.0       0.1X
   Colletion of int with 1 elements, useIterator: false                      21 
            25           3          0.5        2103.1       1.0X
   Colletion of int with 10 elements, useIterator: false                     37 
            42           3          0.3        3729.3       0.6X
   Colletion of int with 100 elements, useIterator: false                   171 
           181          10          0.1       17105.7       0.1X
   Colletion of string with 1 elements, useIterator: false                   24 
            27           2          0.4        2410.6       0.9X
   Colletion of string with 10 elements, useIterator: false                  52 
            57           3          0.2        5213.2       0.4X
   Colletion of string with 100 elements, useIterator: false                343 
           354           8          0.0       34284.3       0.1X
   Colletion of Array[int] with 1 elements, useIterator: false               23 
            26           2          0.4        2339.2       0.9X
   Colletion of Array[int] with 10 elements, useIterator: false              46 
            53           7          0.2        4637.8       0.5X
   Colletion of Array[int] with 100 elements, useIterator: false            302 
           326          12          0.0       30210.0       0.1X
   ```
   
   The second run was on significantly slower cpu so the time can not be 
compared directly between them. But the main take away is that by not throwing 
an exception (it was actually two exceptions as we raised a different after 
catching the one from kryo) when reaching the end of the stream, we lowered the 
performance difference between using the iterator interface and not using it. 
   
   If we could avoid using the iterator interface that would be better. But the 
way it avoid in the benchmark is that it leverages that it knows how many items 
to expect in the stream, but I believe that in our normal use cases this 
information is usually not available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to