Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22638#discussion_r222952924
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala ---
@@ -127,16 +127,16 @@ class DatasetCacheSuite extends QueryTest with
SharedSQLContext with TimeLimits
}
test("cache UDF result correctly") {
- val expensiveUDF = udf({x: Int => Thread.sleep(5000); x})
- val df = spark.range(0, 10).toDF("a").withColumn("b",
expensiveUDF($"a"))
+ val expensiveUDF = udf({x: Int => Thread.sleep(2000); x})
--- End diff --
well, I do think this will pass 100% times, my concern was that in case of
a regression we might fail detecting it. But yes, with the repartition to 1
you're right, I haven't considered it, otherwise they may have run in parallel.
So this seems enough.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]