srowen commented on code in PR #44690:
URL: https://github.com/apache/spark/pull/44690#discussion_r1474340184


##########
core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala:
##########
@@ -20,6 +20,49 @@ package org.apache.spark.resource
 import scala.collection.mutable
 
 import org.apache.spark.SparkException
+import org.apache.spark.resource.ResourceAmountUtils.ONE_ENTIRE_RESOURCE
+
+private[spark] object ResourceAmountUtils {
+  /**
+   * Using "double" to do the resource calculation may encounter a problem of 
precision loss. Eg
+   *
+   * scala> val taskAmount = 1.0 / 9
+   * taskAmount: Double = 0.1111111111111111
+   *
+   * scala> var total = 1.0
+   * total: Double = 1.0
+   *
+   * scala> for (i <- 1 to 9 ) {
+   * |   if (total >= taskAmount) {
+   * |           total -= taskAmount
+   * |           println(s"assign $taskAmount for task $i, total left: $total")
+   * |   } else {
+   * |           println(s"ERROR Can't assign $taskAmount for task $i, total 
left: $total")
+   * |   }
+   * | }
+   * assign 0.1111111111111111 for task 1, total left: 0.8888888888888888
+   * assign 0.1111111111111111 for task 2, total left: 0.7777777777777777
+   * assign 0.1111111111111111 for task 3, total left: 0.6666666666666665
+   * assign 0.1111111111111111 for task 4, total left: 0.5555555555555554
+   * assign 0.1111111111111111 for task 5, total left: 0.44444444444444425
+   * assign 0.1111111111111111 for task 6, total left: 0.33333333333333315
+   * assign 0.1111111111111111 for task 7, total left: 0.22222222222222204
+   * assign 0.1111111111111111 for task 8, total left: 0.11111111111111094
+   * ERROR Can't assign 0.1111111111111111 for task 9, total left: 
0.11111111111111094
+   *
+   * So we multiply ONE_ENTIRE_RESOURCE to convert the double to long to avoid 
this limitation.
+   * Double can display up to 16 decimal places, so we set the factor to
+   * 10, 000, 000, 000, 000, 000L.
+   */
+  final val ONE_ENTIRE_RESOURCE: Long = 10000000000000000L

Review Comment:
   I agree it also doesn't feel wholly satisfying. In this example all of those 
values work as 9 times even 0.11 leaves you with less than 0.11 remaining, so 
you schedule 9. It would also imply there is 0.01 GPU left when that isn't the 
intent. In practice, I strongly doubt anyone is ever scheduling, let's say, 
more than 100 tasks on one GPU.
   
   (But what about non-GPU resources? there aren't any now. Are there resources 
you'd schedule very very small fractions of? I can't think of any even in the 
future.)
   
   Going down the rabbit hole of floating-point precision, I think we hit that 
no matter what. If I ask for 1e-16 resources, any way we interpret that 
probably is slightly imprecise as it's interpreted as float somewhere. But 
these are unrealistic use cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to