Github user danielyli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17793#discussion_r115112172
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala 
---
    @@ -910,26 +944,143 @@ object ALS extends DefaultParamsReadable[ALS] with 
Logging {
       private type FactorBlock = Array[Array[Float]]
     
       /**
    -   * Out-link block that stores, for each dst (item/user) block, which src 
(user/item) factors to
    -   * send. For example, outLinkBlock(0) contains the local indices (not 
the original src IDs) of the
    -   * src factors in this block to send to dst block 0.
    +   * A mapping of the columns of the items factor matrix that are needed 
when calculating each row
    +   * of the users factor matrix, and vice versa.
    +   *
    +   * Specifically, when calculating a user factor vector, since only those 
columns of the items
    +   * factor matrix that correspond to the items that that user has rated 
are needed, we can avoid
    +   * having to repeatedly copy the entire items factor matrix to each 
worker later in the algorithm
    +   * by precomputing these dependencies for all users, storing them in an 
RDD of `OutBlock`s.  The
    +   * items' dependencies on the columns of the users factor matrix is 
computed similarly.
    +   *
    +   * =Example=
    +   *
    +   * Using the example provided in the `InBlock` Scaladoc, `userOutBlocks` 
would look like the
    +   * following:
    +   *
    +   * {{{ userOutBlocks.collect() == Seq(
    +   *       0 -> Array(Array(0, 1), Array(0, 1)),
    +   *       1 -> Array(Array(0), Array(0))) }}}
    +   *
    +   * The data structure encodes the following information:
    --- End diff --
    
    Yeah, I agree, it could be clearer.  Let me rewrite it, taking in to 
account your suggestions, and update the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to