[GitHub] [spark] zhengruifeng commented on pull request #35457: [SPARK-36553][ML] KMeans avoid compute auxiliary statistics for large K

GitBox Wed, 09 Feb 2022 18:56:42 -0800


zhengruifeng commented on pull request #35457:
URL: https://github.com/apache/spark/pull/35457#issuecomment-1034440217



   I think I made it too complex.
   
   
   
   according to @anders-rydbirk  your description in the ticket:
   
   ```
   Possible workaround:
   
       Roll back to Spark 3.0.0 since a KMeansModel generated with 3.0.0 cannot 
be loaded in 3.1.1.
       Reduce K. Currently trying with 45000.
   
   ```
   
   maybe we just need to chang `k * (k + 1) / 2` to `(k.toLong * (k + 1) / 
2).toInt`?
   ```
   scala> val k = 50000
   val k: Int = 50000
   
   scala> k * (k + 1) / 2
   val res8: Int = -897458648
   
   scala> (k.toLong * (k + 1) / 2).toInt
   val res9: Int = 1250025000
   
   scala> val k = 45000
   val k: Int = 45000
   
   scala> k * (k + 1) / 2
   val res10: Int = 1012522500
   
   scala> (k.toLong * (k + 1) / 2).toInt
   val res11: Int = 1012522500
   ```
   
   
   
   > Sorry, I guess I mean make it into an array of arrays, not one big array.
   
   @srowen  yes, using arrays of sizes (1, 2, ..., k) is another choice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #35457: [SPARK-36553][ML] KMeans avoid compute auxiliary statistics for large K

Reply via email to