subject:"\[GitHub\] spark issue #19816\: \[SPARK\-21693\]\[FOLLOWUP\]\[R\] Reduce shuffle partitions run..."

[GitHub] spark issue #19816: [SPARK-21693][FOLLOWUP][R] Reduce shuffle partitions run...

2017-11-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19816
  
Not sure. Let me know if you have a preference @felixcheung.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19816: [SPARK-21693][FOLLOWUP][R] Reduce shuffle partitions run...

2017-11-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19816
  
@felixcheung, I just tried to lower this by default and ran. Seems some 
tests are being failed. For example, if we lower`spark.sql.shuffle.partitions` 
to 5, these fail additionally:

```
Failed 
-
1. Failure: spark.als (@test_mllib_recommendation.R#36) 

predictions$prediction not equal to c(-0.1380762, 2.6258414, -1.5018409).
3/3 mismatches (average diff: 2.75)
[1]  2.626 - -0.138 ==  2.76
[2] -1.502 -  2.626 == -4.13
[3] -0.138 - -1.502 ==  1.36


2. Failure: pivot GroupedData column (@test_sparkSQL.R#1921) 
---
`sum1` not equal to `correct_answer`.
Component âyearâ: Mean relative difference: 0.0004961548
Component âPythonâ: Mean relative difference: 0.0952381
Component âRâ: Mean relative difference: 0.5454545


3. Failure: pivot GroupedData column (@test_sparkSQL.R#1922) 
---
`sum2` not equal to `correct_answer`.
Component âyearâ: Mean relative difference: 0.0004961548
Component âPythonâ: Mean relative difference: 0.0952381
Component âRâ: Mean relative difference: 0.5454545


4. Failure: pivot GroupedData column (@test_sparkSQL.R#1923) 
---
`sum3` not equal to `correct_answer`.
Component âyearâ: Mean relative difference: 0.0004961548
Component âPythonâ: Mean relative difference: 0.0952381
Component âRâ: Mean relative difference: 0.5454545


5. Failure: pivot GroupedData column (@test_sparkSQL.R#1924) 
---
`sum4` not equal to correct_answer[, c("year", "R")].
Component âyearâ: Mean relative difference: 0.0004961548
Component âRâ: Mean relative difference: 0.5454545
```
 
Shuffle + R worker cases look not quite frequent (to be clear, just shuffle 
without R will be fine IIUC). 

I don't have a strong opinion on lowering because ..  if we don't lower, 
some tests in the future could cause such problem again vs if we should lower, 
the required change looks quite larger and this case might be not quite 
frequent. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19816: [SPARK-21693][FOLLOWUP][R] Reduce shuffle partitions run...

[GitHub] spark issue #19816: [SPARK-21693][FOLLOWUP][R] Reduce shuffle partitions run...

2 matches

Site Navigation

Mail list logo

Footer information