Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/493#discussion_r11986066
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -708,6 +708,86 @@ object ALS {
         trainImplicit(ratings, rank, iterations, 0.01, -1, 1.0)
       }
     
    +  /**
    +   * :: DeveloperApi ::
    +   * Given an RDD of ratings, a rank, and two partitioners, compute rough 
estimates of the
    +   * computation time and communication cost of one iteration of ALS.  
Returns a pair of pairs of
    --- End diff --
    
    The return format is not very easy to understand. For each user block, we 
want to know three estimates:
    
    1. how much incoming data in a user iteration.
    2. how much computation (YtY and LS) in a user iteration.
    3. how much outgoing data in a product iteration.
    
    We can create a case class `Cost(index, n, numRatings, dataOut, dataIn)` 
where `n` is number of users/products in this block. Then the output type 
becomes `(Seq[Cost], Seq[Cost])` -> user block costs and product block costs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to