LuciferYang commented on code in PR #37843:
URL: https://github.com/apache/spark/pull/37843#discussion_r972648962


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java:
##########
@@ -44,7 +46,12 @@ public interface Expression {
    * List of fields or columns that are referenced by this expression.
    */
   default NamedReference[] references() {
-    return Arrays.stream(children()).map(e -> e.references())
-      .flatMap(Arrays::stream).distinct().toArray(NamedReference[]::new);
+    // SPARK-40398: Replace `Arrays.stream()...distinct()`
+    // to this for perf gain, the result order is not important.
+    Set<NamedReference> set = new HashSet<>();
+    for (Expression e : children()) {
+      Collections.addAll(set, e.references());
+    }
+    return set.toArray(new NamedReference[0]);

Review Comment:
   Change to Python linter check failed...
   https://github.com/LuciferYang/spark/actions/runs/3065275580/jobs/4949221030
   
   
   ```
   starting mypy annotations test...
   annotations failed mypy checks:
   python/pyspark/pandas/window.py:112: error: Module has no attribute "lit"  
[attr-defined]
   Found 1 error in 1 file (checked 340 source files)
   1
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to