juliuszsompolski commented on PR #48211:
URL: https://github.com/apache/spark/pull/48211#issuecomment-2368814534

   @cloud-fan @JoshRosen 
   Given the change, do you think it would make sense to add a copy() API to 
Dataset, sth like
   ```
   /**
    * Create a copy of this Dataset with a fresh execution.
    *
    * While a Dataset object caches its query plan, this will create a new 
Dataset that will
    * start from scratch from the parsed logical plan.
    */
   def copy(): Dataset[T] = {
     new Dataset[T](this.sparkSession, this.queryExecution.logical, 
this.encoder)
   }
   ```
   so that users can re-run all stages that are otherwise lazy cached?
   It was relevant also before this change, because I don't think there was a 
no-op way to get a "fresh" Dataset object? I suppose the closest to noop would 
have been `.as[T]` with T being the existing encoder, but one would need to 
know the encoder of the Dataset to pass it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to