huaxingao commented on a change in pull request #27389:
[SPARK-30662][ML][PySpark] ALS/MLP extend HasBlockSize
URL: https://github.com/apache/spark/pull/27389#discussion_r372736585
##########
File path: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
##########
@@ -288,6 +289,15 @@ class ALSModel private[ml] (
@Since("2.2.0")
def setColdStartStrategy(value: String): this.type = set(coldStartStrategy,
value)
+ /**
+ * Set block size for stacking input data in matrices.
+ * Default is 4096.
Review comment:
Thanks for the comment. I actually saw the default changed to 1024 in that
PR, but I want the default to be 4096, that's why I set it explicitly in line
675 in the Estimator
```setDefault(blockSize -> 4096)```.
I want the default to be 4096 because the ```blockify``` has 4096 as
default. I don't want to change the current default value.
```
private def blockify(
factors: Dataset[(Int, Array[Float])],
blockSize: Int = 4096): Dataset[Seq[(Int, Array[Float])]] = {
import factors.sparkSession.implicits._
factors.mapPartitions(_.grouped(blockSize))
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]