viirya commented on issue #25280: [SPARK-28548][SQL] explain() shows wrong 
result for persisted DataFrames after some operations
URL: https://github.com/apache/spark/pull/25280#issuecomment-516024047
 
 
   We actually take the query execution of dataset to execute. If we have 
executed a dataset, so its physical plan is materialized, then persist it. In 
2.4.3, although df.explain shows a cached plan, I think execution still uses 
physical plan without cache? This fix also has the issue?
   
   That said, in 2.4.3, df.explain shows query plan with current status like 
cache, temp view, I think it doesn't really match with dataset execution.
   
   Like:
   
   ```scala
   val df = spark.range(10)
   df.explain // show query plan without cache
   df.collect() // execution without cache 
   df.persist
   df.explain // show query plan with cache
   df.collect() // still execution without cache
   ```
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to