[GitHub] [spark] joshrosen-stripe commented on a change in pull request #25503: [SPARK-28702][SQL] Display useful error message (instead of NPE) for invalid Dataset operations

GitBox Mon, 19 Aug 2019 19:39:37 -0700

joshrosen-stripe commented on a change in pull request #25503: 
[SPARK-28702][SQL] Display useful error message (instead of NPE) for invalid 
Dataset operations 
URL: https://github.com/apache/spark/pull/25503#discussion_r315484406


 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -184,11 +184,26 @@ private[sql] object Dataset {
  */
 @Stable
 class Dataset[T] private[sql](
-    @transient val sparkSession: SparkSession,
+    @transient val _sparkSession: SparkSession,
     @DeveloperApi @Unstable @transient val queryExecution: QueryExecution,
     @DeveloperApi @Unstable @transient val encoder: Encoder[T])
   extends Serializable {
 
+  def sparkSession: SparkSession = {
+    if (_sparkSession == null) {
+      throw new SparkException(
+      "This Dataset lacks a SparkSession. It could happen in the following 
cases: \n(1) Dataset " +
+      "transformations and actions are NOT invoked by the driver, but inside 
of other " +
+      "transformations; for example, dataset1.map(x => dataset2.values.count() 
* x) is invalid " +
+      "because the values transformation and count action cannot be performed 
inside of the " +
+      "dataset1.map transformation. For more information, see 
SPARK-28702.\n(2) When a Spark " +
 
 Review comment:
   We may want to either re-word or remove bullet point (2) because it's 
discussing DStreams but I think those are unlikely to be used with Datasets.
   
   (For reference, https://github.com/apache/spark/pull/11595 added this 
wording for the RDD version of this patch).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] joshrosen-stripe commented on a change in pull request #25503: [SPARK-28702][SQL] Display useful error message (instead of NPE) for invalid Dataset operations

Reply via email to