[GitHub] [spark] rxin commented on issue #24991: [SPARK-28188] Materialize Dataframe API

GitBox Wed, 03 Jul 2019 17:38:24 -0700

rxin commented on issue #24991: [SPARK-28188] Materialize Dataframe API
URL: https://github.com/apache/spark/pull/24991#issuecomment-508297811
 
 
   Got it. But the name is plain wrong because the function doesn't materialize 
the DataFrame. As a matter of fact, if you run this "materialize" on a query 
plan without a shuffle this becomes a job that's completely useless that waste 
a lot of resources.
   
   I've seen a different use case myself: I want to measure the execution time 
of a query plan, without materializing the data or invoking any I/O, or any 
overhead. What I ended up doing was implementing a simple data sink that 
doesn't write anything.
   
   Looks like that can be used here as well?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rxin commented on issue #24991: [SPARK-28188] Materialize Dataframe API

Reply via email to