JoshRosen opened a new pull request #34691:
URL: https://github.com/apache/spark/pull/34691


   ### What changes were proposed in this pull request?
   
   This PR adds caching to `LogicalPlan.isStreaming()`: the default 
implementation's result will now be cached in a `private lazy val`.
   
   ### Why are the changes needed?
   
   This improves the performance of the `DeduplicateRelations` analyzer rule.
   
   The default implementation of `isStreaming` recursively visits every node in 
the tree. `DeduplicateRelations.renewDuplicatedRelations` is recursively 
invoked on every node in the tree and each invocation calls `isStreaming`. This 
leads to `O(n^2)` invocations of `isStreaming` on leaf nodes.
   
   Caching `isStreaming` avoids this performance problem.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Correctness should be covered by existing tests.
   
   This significantly improved `DeduplicateRelations` performance in local 
microbenchmarking with large query plans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to