srowen commented on code in PR #44690:
URL: https://github.com/apache/spark/pull/44690#discussion_r1475475431
##########
core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala:
##########
@@ -20,6 +20,49 @@ package org.apache.spark.resource
import scala.collection.mutable
import org.apache.spark.SparkException
+import org.apache.spark.resource.ResourceAmountUtils.ONE_ENTIRE_RESOURCE
+
+private[spark] object ResourceAmountUtils {
+ /**
+ * Using "double" to do the resource calculation may encounter a problem of
precision loss. Eg
+ *
+ * scala> val taskAmount = 1.0 / 9
+ * taskAmount: Double = 0.1111111111111111
+ *
+ * scala> var total = 1.0
+ * total: Double = 1.0
+ *
+ * scala> for (i <- 1 to 9 ) {
+ * | if (total >= taskAmount) {
+ * | total -= taskAmount
+ * | println(s"assign $taskAmount for task $i, total left: $total")
+ * | } else {
+ * | println(s"ERROR Can't assign $taskAmount for task $i, total
left: $total")
+ * | }
+ * | }
+ * assign 0.1111111111111111 for task 1, total left: 0.8888888888888888
+ * assign 0.1111111111111111 for task 2, total left: 0.7777777777777777
+ * assign 0.1111111111111111 for task 3, total left: 0.6666666666666665
+ * assign 0.1111111111111111 for task 4, total left: 0.5555555555555554
+ * assign 0.1111111111111111 for task 5, total left: 0.44444444444444425
+ * assign 0.1111111111111111 for task 6, total left: 0.33333333333333315
+ * assign 0.1111111111111111 for task 7, total left: 0.22222222222222204
+ * assign 0.1111111111111111 for task 8, total left: 0.11111111111111094
+ * ERROR Can't assign 0.1111111111111111 for task 9, total left:
0.11111111111111094
+ *
+ * So we multiply ONE_ENTIRE_RESOURCE to convert the double to long to avoid
this limitation.
+ * Double can display up to 16 decimal places, so we set the factor to
+ * 10, 000, 000, 000, 000, 000L.
+ */
+ final val ONE_ENTIRE_RESOURCE: Long = 10000000000000000L
Review Comment:
I don't think we can solve floating-point accuracy here in the general case,
and this will virtually never arise anyway, except in one important class of
case -- n GPUs where n < 10 and n is relatively prime to 10. Like, 3 even. A
person writing down the resource utilization will almost surely write "0.333",
but one can imagine supplying `str(1./3.)` programmatically. And then this
issue could arise. Some string like "0.333333333" may end up as a float that
has a value just over 1/3.
The other issue is approaching it this way by just multiplying by a long. I
guess instead this can be done properly with BigDecimal at least? and I don't
think this should touch so many parts of the code. Surely this just affects how
the resource request string is parsed and compared to available resources
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]