HongJin created SPARK-31260: ------------------------------- Summary: How to speed up WholeStageCodegen in Spark SQL Query? Key: SPARK-31260 URL: https://issues.apache.org/jira/browse/SPARK-31260 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 2.4.4 Reporter: HongJin
It's took about 2mins for one 248 MB file. 2 files ~ 5 mins How can I tune or maximize the performance. Initialize spark as below: {{.setMaster(numCores) .set("spark.driver.host", "localhost") .set("spark.executor.cores","2") .set("spark.num.executors","2") .set("spark.executor.memory", "4g") .set("spark.dynamicAllocation.enabled", "true") .set("spark.dynamicAllocation.minExecutors","2") .set("spark.dynamicAllocation.maxExecutors","2") .set("spark.ui.enabled","true") .set("spark.sql.shuffle.partitions",defaultPartitions)}} {{}} {{joinedDf = upperCaseLeft.as("l") .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") .select(compositeKeysCol ::: nonKeyCols.map(col => mapHelper(col,toleranceValue,caseSensitive)): _*)}} {{}} {{}} {{}} {{data = joinedDf.take(1000)}} {{}} [https://i.stack.imgur.com/oeYww.png]{{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org