Vitalii Li created SPARK-39169: ---------------------------------- Summary: Optimize FIRST when used as non-aggregate Key: SPARK-39169 URL: https://issues.apache.org/jira/browse/SPARK-39169 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Vitalii Li
When `FIRST` is a single aggregate function in `Aggregate` we could either rewrite whole query or optimize execution logic. * Plan => `SELECT FIRST(<col>) FROM <table> [GROUP BY <col>]` => `SELECT <col> FROM <table> LIMIT 1`. Note that setting `ignoreNulls` to `true` should block such rewrite since returns could differ in case all values of <col> are `NULL` * Execution => `SELECT FIRST(<col>) FROM <table> GROUP BY <col2>` => short circuit iteration per key once a value for `FIRST` is set. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org