HeartSaVioR edited a comment on issue #25853: [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer URL: https://github.com/apache/spark/pull/25853#issuecomment-548264521 I've also run test with cloud VM (private) - it's not "dedicated" so still be affected from others but at least pretty much better than my local dev in terms of "isolation". In this time I've run test 5 times per test target (master, SPARK-21869), as well as pick broader batches (100 ~ 199). * rate-row-per-second: 900000 * rate-ramp-up-time-second: 30 * num-partitions: 10 > addBatch (lower is better) commit | trial# | max | min | median | perc 90 | perc 95 | perc 99 ------- | ----- | ----- | --- | ------- | --------- | ------- | -------- master | 1 | 977 | 582 | 720.2 | 680 | 907.4 | 921 | 954.23 master | 2 | 3146 | 591 | 1136.87 | 939.5 | 1973.8 | 2387 | 2863.85 master | 3 | 3936 | 585 | 1093.84 | 810 | 1889.4 | 3235.8 | 3780.57 master | 4 | 1228 | 610 | 735.38 | 709.5 | 890 | 973.5 | 1080.49 master | 5 | 1792 | 583 | 852.18 | 820.5 | 1173.8 | 1235.7 | 1291.06 SPARK-21869 | 1 | 375 | 222 | 258.83 | 255 | 278.2 | 306.75 | 364.11 SPARK-21869 | 2 | 338 | 219 | 252.54 | 252 | 269.1 | 283.25 | 310.28 SPARK-21869 | 3 | 446 | 213 | 258.78 | 258 | 276.1 | 282.3 | 301.46 SPARK-21869 | 4 | 374 | 219 | 261.18 | 261.5 | 278.1 | 287.3 | 304.7 SPARK-21869 | 5 | 335 | 216 | 249.43 | 245 | 273.2 | 280.05 | 316.19 > processedRowsPerSecond (higher is better) commit | trial# | max | min | median | perc 90 | perc 95 | perc 99 ------- | ----- | ----- | --- | ------- | --------- | ------- | -------- master | 1 | 1404056.16 | 867888.13 | 1179292.12 | 1212938.00 | 1367989.34 | 1378254.21 | 1399732.63 master | 2 | 1391035.54 | 677710.84 | 976281.35 | 960013.38 | 1213926.81 | 1271141.22 | 1340790.47 master | 3 | 1440000 | 619408.12 | 1061200.45 | 1098841.51 | 1313868.61 | 1331459.56 | 1385169.23 master | 4 | 1345291.47 | 697674.41 | 1149405.16 | 1167332.84 | 1304726.99 | 1316079.29 | 1339345.77 master | 5 | 1401869.15 | 670141.47 | 1054835.85 | 1030994.12 | 1367989.34 | 1374150.85 | 1389018.69 SPARK-21869 | 1 | 3237410.07 | 2073732.71 | 2844976.22 | 2880007.37 | 3072724.29 | 3147405.22 | 3203192.60 SPARK-21869 | 2 | 3214285.71 | 2244389.02 | 2879452.91 | 2893890.67 | 3082191.78 | 3128324.46 | 3202961.36 SPARK-21869 | 3 | 3260869.56 | 1782178.21 | 2824598.32 | 2825752.64 | 3040540.54 | 3071672.35 | 3237644.66 SPARK-21869 | 4 | 3249097.47 | 2050113.89 | 2796697.70 | 2786377.70 | 2981122.52 | 3093853.67 | 3180900.86 SPARK-21869 | 5 | 3284671.53 | 2290076.33 | 2905926.30 | 2917349.64 | 3136984.96 | 3181343.76 | 3237882.68 I've run test with crossed sequence - master -> SPARK-21869 -> master -> SPARK-21869 -> ... so the test result would have less affected by loads from other process/VMs. (These numbers in SPARK-21869 seem stable.) So assuming test result is valid, the patch is even better than current in terms of performance if there're multiple tasks using same, say, non-shared N connections show better performance and stability compared to the shared 1 connection. It would be appreciated if someone take a look at the code for perf test and run test for your env.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
