Shixiong Zhu created SPARK-37646: ------------------------------------ Summary: Avoid touching Scala reflection APIs in the lit function Key: SPARK-37646 URL: https://issues.apache.org/jira/browse/SPARK-37646 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Shixiong Zhu
Currently lit is slow when the concurrency is high as it needs to hit the Scala reflection code which hits global locks. For example, running the following test locally using Spark 3.2 shows the difference: {code:java} scala> :paste // Entering paste mode (ctrl-D to finish)import org.apache.spark.sql.functions._ import org.apache.spark.sql.Column import org.apache.spark.sql.catalyst.expressions.Literalval parallelism = 50def testLiteral(): Unit = { val ts = for (_ <- 0 until parallelism) yield { new Thread() { override def run() { for (_ <- 0 until 50) { new Column(Literal(0L)) } } } } ts.foreach(_.start()) ts.foreach(_.join()) }def testLit(): Unit = { val ts = for (_ <- 0 until parallelism) yield { new Thread() { override def run() { for (_ <- 0 until 50) { lit(0L) } } } } ts.foreach(_.start()) ts.foreach(_.join()) }println("warmup") testLiteral() testLit()println("lit: false") spark.time { testLiteral() } println("lit: true") spark.time { testLit() }// Exiting paste mode, now interpreting.warmup lit: false Time taken: 8 ms lit: true Time taken: 682 ms import org.apache.spark.sql.functions._ import org.apache.spark.sql.Column import org.apache.spark.sql.catalyst.expressions.Literal parallelism: Int = 50 testLiteral: ()Unit testLit: ()Unit {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org