Xi Lyu created SPARK-47818: ------------------------------ Summary: Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests Key: SPARK-47818 URL: https://issues.apache.org/jira/browse/SPARK-47818 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Xi Lyu Fix For: 4.0.0
While building the DataFrame step by step, each time a new DataFrame is generated with an empty schema. However, if a user's code frequently accesses the schema of these new DataFrames using methods such as `df.columns`, it will result in a large number of Analyze requests to the server. Each time, the entire plan needs to be reanalyzed, leading to poor performance, especially when constructing highly complex plans. Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the overhead of repeated analysis during this process. This is achieved by saving significant computation if the resolved logical plan of a subtree of can be cached. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org