[GitHub] [spark] maropu edited a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

GitBox Fri, 19 Jun 2020 17:11:15 -0700


maropu edited a comment on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-646902510



   > Could you please let me know would it be OK to hard-code the read buffer 
size to 1024?
   
   You think the performance is independent of running platforms, e.g., CPU 
arch and disk I/O? If its independent, the hard-coded looks okay.
   
   > With 10TB TPCDS data set I tested spilling with query q14a and buffer size 
of 1024. Execution with hard-coded read buffer size is faster by 37% (27 min vs 
37 min) comparing to the execution when buffer size is parameterized and the 
same size 1024 is used. Query q14a, for 10TB data set, generates around 180 
million joins per partition and when buffer size is parameterized, that 
translates into 10 min longer execution time.
   
   Why does the parameterized one have so much overhead?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu edited a comment on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

Reply via email to