Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187884949
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + 
Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    --- End diff --
    
    Doubling the size when growing is very typical, seems what you want to 
address is when the memory is enough for the requsted size but not enough for 
doubling the size. I'd suggest we should double the size most of the time, as 
long as there is enough memory.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to