Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22631#discussion_r223224573
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
    @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
       }
     
       test("cast string to timestamp") {
    -    for (tz <- ALL_TIMEZONES) {
    +    for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
    --- End diff --
    
    Tests should be deterministic, ideally; any sources of randomness should be 
seeded. Do you see one that isn't? 
    
    I think this is like deciding we'll run just 90% of all test suites every 
time randomly, to save time. I think it's just well against good practice.
    
    There are other solutions:
    1) pick a subset of timezones that we're confident do exercise the code and 
just explicitly test those
    2) parallelize these tests within the test suite
    
    The latter should be trivial in this case: `ALL_TIMEZONES.par.foreach { tz 
=>` instead. It's the same amount of work but 8x, 16x faster by wall clock 
time, depending on how many cores are available. What about that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to