HyukjinKwon commented on a change in pull request #25503: [SPARK-28702][SQL]
Display useful error message (instead of NPE) for invalid Dataset operations
URL: https://github.com/apache/spark/pull/25503#discussion_r315648945
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -184,11 +184,22 @@ private[sql] object Dataset {
*/
@Stable
class Dataset[T] private[sql](
- @transient val sparkSession: SparkSession,
+ @transient private val _sparkSession: SparkSession,
@DeveloperApi @Unstable @transient val queryExecution: QueryExecution,
@DeveloperApi @Unstable @transient val encoder: Encoder[T])
extends Serializable {
+ @transient lazy val sparkSession: SparkSession = {
+ if (_sparkSession == null) {
+ throw new SparkException(
+ "Dataset transformations and actions can only be invoked by the driver,
not inside of" +
+ " other transformations; for example, dataset1.map(x =>
dataset2.values.count() * x)" +
Review comment:
While I agree with this in general, `not inside of other transformations`
can be controversial: e.g. `Dataset.transform`. I would just remove this words
or reword it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]