Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/242#discussion_r11406772
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -421,12 +421,12 @@ class ALS private (
* Compute the new feature vectors for a block of the users matrix given
the list of factors
* it received from each product and its InLinkBlock.
*/
- private def updateBlock(messages: Seq[(Int, Array[Array[Double]])],
inLinkBlock: InLinkBlock,
+ private def updateBlock(messages: Iterable[(Int, Array[Array[Double]])],
inLinkBlock: InLinkBlock,
rank: Int, lambda: Double, alpha: Double, YtY:
Option[Broadcast[DoubleMatrix]])
: Array[Array[Double]] =
{
// Sort the incoming block factor messages by block ID and make them
an array
- val blockFactors = messages.sortBy(_._1).map(_._2).toArray //
Array[Array[Double]]
+ val blockFactors = messages.toArray.sortBy(_._1).map(_._2) //
Array[Array[Double]]
--- End diff --
So looking at the behaviour of Seq in the scala shell, it looks like we can
replace this toArray with toSeq and not have the performance hit for now (until
we start using something other than Seq's as iterables at which point we
actually get benefits from the change).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---