doki23 commented on code in PR #45181:
URL: https://github.com/apache/spark/pull/45181#discussion_r1498639875


##########
sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala:
##########
@@ -82,6 +82,26 @@ class DatasetCacheSuite extends QueryTest
     assert(cached.storageLevel == StorageLevel.NONE, "The Dataset should not 
be cached.")
   }
 
+  test("SPARK-46992 collect before persisting") {
+    val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS().select(expr("_2 + 
1").as[Int])
+    // collect first
+    ds.collect()
+    // and then cache it
+    val cached = ds.cache()
+    // ds is not cached
+    assertNotCached(ds)
+    // Make sure, the Dataset is indeed cached.
+    assertCached(cached)
+
+    // Check result.
+    checkDataset(
+      cached,
+      2, 3, 4)

Review Comment:
   It makes sure that the cached data of the new `Dataset` instance is as 
expected. I'll also add one more case that proves the results of 
`cached.count()` and `cached.collect()` are consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to