[GitHub] [spark] HeartSaVioR opened a new pull request #31056: [SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression

GitBox Tue, 05 Jan 2021 20:39:02 -0800


HeartSaVioR opened a new pull request #31056:
URL: https://github.com/apache/spark/pull/31056



   ### What changes were proposed in this pull request?
   
   This PR proposes to adjust the order of check in 
KafkaTokenUtil.needTokenUpdate, so that short-circuit applies on the 
non-delegation token cases (insecure + secured without delegation token) and 
remedies the performance regression heavily.
   
   ### Why are the changes needed?
   
   There's a serious performance regression between Spark 2.4 vs Spark 3.0 on 
read path against Kafka data source.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually ran a reproducer (https://github.com/codegorillauk/spark-kafka-read 
with modification to just count instead of writing to Kafka topic) with 
measuring the time.
   
   > the branch applying the change with adding measurement
   
   https://github.com/HeartSaVioR/spark/commits/debug-SPARK-33635-v3.0.1
   
   > the branch only adding measurement
   
   
https://github.com/HeartSaVioR/spark/commits/debug-original-ver-SPARK-33635-v3.0.1
   
   > the result (before the fix)
   
   count: 10280000
   Took 41.634007047 secs
   
   21/01/06 13:16:07 INFO KafkaDataConsumer: debug ver. 17-original
   21/01/06 13:16:07 INFO KafkaDataConsumer: Total time taken to retrieve: 
82118 ms
   
   > the result (after the fix)
   
   count: 10280000
   Took 7.964058475 secs
   
   21/01/06 13:08:22 INFO KafkaDataConsumer: debug ver. 17
   21/01/06 13:08:22 INFO KafkaDataConsumer: Total time taken to retrieve: 987 
ms


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR opened a new pull request #31056: [SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression

Reply via email to