[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...

mengxr Mon, 02 May 2016 00:01:46 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11919#discussion_r61708653
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -620,4 +663,82 @@ object ALSSuite {
         "intermediateStorageLevel" -> "MEMORY_ONLY",
         "finalStorageLevel" -> "MEMORY_AND_DISK_SER"
       )
    +
    +  // Helper functions to generate test data we share between ALS test 
suites
    +
    +  /**
    +   * Generates random user/item factors, with i.i.d. values drawn from 
U(a, b).
    +   * @param size number of users/items
    +   * @param rank number of features
    +   * @param random random number generator
    +   * @param a min value of the support (default: -1)
    +   * @param b max value of the support (default: 1)
    +   * @return a sequence of (ID, factors) pairs
    +   */
    +  private def genFactors(
    +      size: Int,
    +      rank: Int,
    +      random: Random,
    +      a: Float = -1.0f,
    +      b: Float = 1.0f): Seq[(Int, Array[Float])] = {
    +    require(size > 0 && size < Int.MaxValue / 3)
    +    require(b > a)
    +    val ids = mutable.Set.empty[Int]
    +    while (ids.size < size) {
    +      ids += random.nextInt()
    +    }
    +    val width = b - a
    +    ids.toSeq.sorted.map(id => (id, Array.fill(rank)(a + 
random.nextFloat() * width)))
    +  }
    +
    +  /**
    +   * Generates an implicit feedback dataset for testing ALS.
    +   *
    +   * @param sc SparkContext
    +   * @param numUsers number of users
    +   * @param numItems number of items
    +   * @param rank rank
    +   * @param noiseStd the standard deviation of additive Gaussian noise on 
training data
    +   * @param seed random seed
    +   * @return (training, test)
    +   */
    +  def genImplicitTestData(
    +      sc: SparkContext,
    +      numUsers: Int,
    +      numItems: Int,
    +      rank: Int,
    +      noiseStd: Double = 0.0,
    +      seed: Long = 11L): (RDD[Rating[Int]], RDD[Rating[Int]]) = {
    +      // The assumption of the implicit feedback model is that unobserved 
ratings are more likely to
    --- End diff --
    
    fix indentation



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...

Reply via email to