[GitHub] [spark] zhengruifeng commented on a diff in pull request #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-22 Thread GitBox
zhengruifeng commented on code in PR #37918: URL: https://github.com/apache/spark/pull/37918#discussion_r977466819 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -367,6 +367,9 @@ object functions { */ def collect_set(columnName: String): Column =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-19 Thread GitBox
zhengruifeng commented on code in PR #37918: URL: https://github.com/apache/spark/pull/37918#discussion_r974838677 ## mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala: ## @@ -496,18 +499,23 @@ class ALSModel private[ml] ( .iterator.map { j =>