srowen commented on a change in pull request #23983: [SPARK-26881][mllib] 
Heuristic for tree aggregate depth
URL: https://github.com/apache/spark/pull/23983#discussion_r267007403
 
 

 ##########
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
 ##########
 @@ -775,6 +778,27 @@ class RowMatrix @Since("1.0.0") (
         s"The number of rows $m is different from what specified or previously 
computed: ${nRows}.")
     }
   }
+
+  /**
+   * Computing desired tree aggregate depth necessary to avoid exceeding
+   * driver.MaxResultSize during aggregation.
+   * Based on the formulae: (numPartitions)^(1/depth) * objectSize <= 
DriverMaxResultSize
+   * @param aggregatedObjectSizeInMb the size, in megabytes, of the object 
being tree aggregated
+   */
+  private[spark] def getTreeAggregateIdealDepth(aggregatedObjectSizeInMb: Int) 
= {
+    val maxDriverResultSizeInMb = rows.conf.get[Long](MAX_RESULT_SIZE) / (1024 
* 1024)
 
 Review comment:
   Sorry to pick on this, but what about dealing in bytes here, not MB? I think 
we might have a problem if the aggregatedObjectSize is so small that it rounds 
down to 0 MB and then below you take the log of 0.
   
   I apologize for only thinking about this now, but I think we have a problem 
when the object size is nearly equal to the max. The desired depth could be 
really big, like 1000 or more.
   
   Indeed, the denominator can be 0 or negative.  I suspect we want to not fail 
in this case but just use a max depth in those cases too.
   
   How about capping the depth between 1 and, say, 10 to be safe? as a 
heuristic I don't think depths larger than that are reasonable anyway. Use 10 
if denominator is <= 0. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to