[GitHub] spark pull request: Spark 1271 (1320) cogroup and groupby should p...

holdenk Tue, 08 Apr 2014 12:27:35 -0700

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/242#discussion_r11406772
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -421,12 +421,12 @@ class ALS private (
        * Compute the new feature vectors for a block of the users matrix given 
the list of factors
        * it received from each product and its InLinkBlock.
        */
    -  private def updateBlock(messages: Seq[(Int, Array[Array[Double]])], 
inLinkBlock: InLinkBlock,
    +  private def updateBlock(messages: Iterable[(Int, Array[Array[Double]])], 
inLinkBlock: InLinkBlock,
           rank: Int, lambda: Double, alpha: Double, YtY: 
Option[Broadcast[DoubleMatrix]])
         : Array[Array[Double]] =
       {
         // Sort the incoming block factor messages by block ID and make them 
an array
    -    val blockFactors = messages.sortBy(_._1).map(_._2).toArray // 
Array[Array[Double]]
    +    val blockFactors = messages.toArray.sortBy(_._1).map(_._2) // 
Array[Array[Double]]
    --- End diff --
    
    So looking at the behaviour of Seq in the scala shell, it looks like we can 
replace this toArray with toSeq and not have the performance hit for now (until 
we start using something other than Seq's as iterables at which point we 
actually get benefits from the change).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1271 (1320) cogroup and groupby should p...

Reply via email to