ScrapCodes commented on a change in pull request #30472:
URL: https://github.com/apache/spark/pull/30472#discussion_r537291078



##########
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
##########
@@ -51,6 +54,29 @@ private[spark] object KubernetesClientUtils extends Logging {
     propertiesWriter.toString
   }
 
+  object StringLengthOrdering extends Ordering[(String, String)] {
+    override def compare(x: (String, String), y: (String, String)): Int = {
+      // compare based on file length and break the tie with string comparison 
of keys.
+      (x._1.length + x._2.length).compare(y._1.length + y._2.length) * 10 +
+        x._1.compareTo(y._1)
+    }

Review comment:
       Good questions, and hoping that I have understood correctly, attempting 
to answer.
   
   1) We need to compare the file sizes _including their names_, because that 
is how they will occupy space in a config map. 
   2) We would like to give priority to as many config files as possible we can 
mount (or store in the configMap), so we would like to sort them by size.
   3) If the two files are equal in lengths, then we do not want to declare 
them equal, so in the compare equation we add their string compare results as 
well. This is done to break the tie. So `*10` is done to give more priority to 
compare by their length but if the comparison result is equal, i.e. two files 
are exactly equal in length then we  get zero as the comparison result. By 
adding string comparison result of their names, i.e. when two files have equal 
lengths, we check if the two files have same name as well ? Then we say they 
are truly equal - in principle this should not happen, because they are files 
in the same directory i.e. `SPARK_CONF_DIR`. So, each file will get some 
ordering value as a result of computing comparison equation, and no data will 
be lost.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to