Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19816
@felixcheung, I just tried to lower this by default and ran. Seems some
tests are being failed. For example, if we lower`spark.sql.shuffle.partitions`
to 5, these fail additionally:
```
Failed
-
1. Failure: spark.als (@test_mllib_recommendation.R#36)
predictions$prediction not equal to c(-0.1380762, 2.6258414, -1.5018409).
3/3 mismatches (average diff: 2.75)
[1] 2.626 - -0.138 == 2.76
[2] -1.502 - 2.626 == -4.13
[3] -0.138 - -1.502 == 1.36
2. Failure: pivot GroupedData column (@test_sparkSQL.R#1921)
---
`sum1` not equal to `correct_answer`.
Component âyearâ: Mean relative difference: 0.0004961548
Component âPythonâ: Mean relative difference: 0.0952381
Component âRâ: Mean relative difference: 0.5454545
3. Failure: pivot GroupedData column (@test_sparkSQL.R#1922)
---
`sum2` not equal to `correct_answer`.
Component âyearâ: Mean relative difference: 0.0004961548
Component âPythonâ: Mean relative difference: 0.0952381
Component âRâ: Mean relative difference: 0.5454545
4. Failure: pivot GroupedData column (@test_sparkSQL.R#1923)
---
`sum3` not equal to `correct_answer`.
Component âyearâ: Mean relative difference: 0.0004961548
Component âPythonâ: Mean relative difference: 0.0952381
Component âRâ: Mean relative difference: 0.5454545
5. Failure: pivot GroupedData column (@test_sparkSQL.R#1924)
---
`sum4` not equal to correct_answer[, c("year", "R")].
Component âyearâ: Mean relative difference: 0.0004961548
Component âRâ: Mean relative difference: 0.5454545
```
Shuffle + R worker cases look not quite frequent (to be clear, just shuffle
without R will be fine IIUC).
I don't have a strong opinion on lowering because .. if we don't lower,
some tests in the future could cause such problem again vs if we should lower,
the required change looks quite larger and this case might be not quite
frequent.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org