[GitHub] spark pull request #13509: [SPARK-15740] [MLLIB] Word2VecSuite "big model lo...

jkbradley Tue, 21 Jun 2016 12:52:40 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13509#discussion_r67939355
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala ---
    @@ -91,20 +91,39 @@ class Word2VecSuite extends SparkFunSuite with 
MLlibTestSparkContext {
     
       }
     
    -  ignore("big model load / save") {
    -    // create a model bigger than 32MB since 9000 * 1000 * 4 > 2^25
    -    val word2VecMap = Map((0 to 9000).map(i => s"$i" -> 
Array.fill(1000)(0.1f)): _*)
    +  test("big model load / save") {
    +    // backupping old values
    +    val oldBufferConfValue = 
spark.conf.get("spark.kryoserializer.buffer.max", "64m")
    +    val oldBufferMaxConfValue = 
spark.conf.get("spark.kryoserializer.buffer", "64k")
    +
    +    // setting test values to trigger partitioning
    +    spark.conf.set("spark.kryoserializer.buffer", "50b")
    +    spark.conf.set("spark.kryoserializer.buffer.max", "50b")
    +
    +    // create a model bigger than 50 Bytes
    +    val word2VecMap = Map((0 to 10).map(i => s"$i" -> 
Array.fill(10)(0.1f)): _*)
         val model = new Word2VecModel(word2VecMap)
     
    +    // est. size of this model, given the formula:
    +    // (floatSize * vectorSize + 15) * numWords
    +    // (4 * 10 + 15) * 10 = 550
    +    // therefore it should generate 12 partitions
    --- End diff --
    
    "12 partitions" --> "multiple partitions"  (The exact number isn't 
important.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #13509: [SPARK-15740] [MLLIB] Word2VecSuite "big model lo...

Reply via email to