viirya commented on issue #25280: [SPARK-28548][SQL] explain() shows wrong result for persisted DataFrames after some operations URL: https://github.com/apache/spark/pull/25280#issuecomment-516024047 We actually take the query execution of dataset to execute. If we have executed a dataset, so its physical plan is materialized, then persist it. In 2.4.3, although df.explain shows a cached plan, I think execution still uses physical plan without cache? This fix also has the issue? That said, in 2.4.3, df.explain shows query plan with current status like cache, temp view, I think it doesn't really match with dataset execution. Like: ```scala val df = spark.range(10) df.explain // show query plan without cache df.collect() // execution without cache df.persist df.explain // show query plan with cache df.collect() // still execution without cache ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
