HeartSaVioR commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons 
Pool to Kafka producer
URL: https://github.com/apache/spark/pull/25853#issuecomment-548264521
 
 
   I've also run test with cloud VM (private) - it's not "dedicated" so still 
be affected from others but at least pretty much better than my local dev in 
terms of "isolation".
   
   In this time I've run test 5 times per test target (master, SPARK-21869), as 
well as pick broader batches (100 ~ 199).
   
   > addBatch (lower is better)
   
   commit | trial# | max | min | median | perc 90 | perc 95 | perc 99
   ------- | ----- | ----- | --- | ------- | --------- | ------- | --------
   master | 1 | 977 | 582 | 720.2 | 680 | 907.4 | 921 | 954.23
   master | 2 | 3146 | 591 | 1136.87 | 939.5 | 1973.8 | 2387 | 2863.85
   master | 3 | 3936 | 585 | 1093.84 | 810 | 1889.4 | 3235.8 | 3780.57
   master | 4 | 1228 | 610 | 735.38 | 709.5 | 890 | 973.5 | 1080.49
   master | 5 | 1792 | 583 | 852.18 | 820.5 | 1173.8 | 1235.7 | 1291.06
   SPARK-21869 | 1 | 375 | 222 | 258.83 | 255 | 278.2 | 306.75 | 364.11
   SPARK-21869 | 2 | 338 | 219 | 252.54 | 252 | 269.1 | 283.25 | 310.28
   SPARK-21869 | 3 | 446 | 213 | 258.78 | 258 | 276.1 | 282.3 | 301.46
   SPARK-21869 | 4 | 374 | 219 | 261.18 | 261.5 | 278.1 | 287.3 | 304.7
   SPARK-21869 | 5 | 335 | 216 | 249.43 | 245 | 273.2 | 280.05 | 316.19
   
   > processedRowsPerSecond (higher is better)
   
   commit | trial# | max | min | median | perc 90 | perc 95 | perc 99
   ------- | ----- | ----- | --- | ------- | --------- | ------- | --------
   master | 1 | 1404056.16 | 867888.13 | 1179292.12 | 1212938.00 | 1367989.34 | 
1378254.21 | 1399732.63
   master | 2 | 1391035.54 | 677710.84 | 976281.35 | 960013.38 | 1213926.81 | 
1271141.22 | 1340790.47
   master | 3 | 1440000 | 619408.12 | 1061200.45 | 1098841.51 | 1313868.61 | 
1331459.56 | 1385169.23
   master | 4 | 1345291.47 | 697674.41 | 1149405.16 | 1167332.84 | 1304726.99 | 
1316079.29 | 1339345.77
   master | 5 | 1401869.15 | 670141.47 | 1054835.85 | 1030994.12 | 1367989.34 | 
1374150.85 | 1389018.69
   SPARK-21869 | 1 | 3237410.07 | 2073732.71 | 2844976.22 | 2880007.37 | 
3072724.29 | 3147405.22 | 3203192.60
   SPARK-21869 | 2 | 3214285.71 | 2244389.02 | 2879452.91 | 2893890.67 | 
3082191.78 | 3128324.46 | 3202961.36
   SPARK-21869 | 3 | 3260869.56 | 1782178.21 | 2824598.32 | 2825752.64 | 
3040540.54 | 3071672.35 | 3237644.66
   SPARK-21869 | 4 | 3249097.47 | 2050113.89 | 2796697.70 | 2786377.70 | 
2981122.52 | 3093853.67 | 3180900.86
   SPARK-21869 | 5 | 3284671.53 | 2290076.33 | 2905926.30 | 2917349.64 | 
3136984.96 | 3181343.76 | 3237882.68
   
   I've run test with crossed sequence - master -> SPARK-21869 -> master -> 
SPARK-21869 -> ... so the test result would have less affected by loads from 
other process/VMs. (These numbers in SPARK-21869 seem stable.)
   
   So assuming test result is valid, the patch is even better than current in 
terms of performance if there're multiple tasks using same, say, non-shared N 
connections show better performance and stability compared to the shared 1 
connection.
   
   It would be appreciated if someone take a look at the code for perf test and 
run test for your env.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to