sumeetgajjar commented on a change in pull request #35047:
URL: https://github.com/apache/spark/pull/35047#discussion_r776197746
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##########
@@ -1073,8 +1135,51 @@ private[joins] object LongHashedRelation {
return HashedRelationWithAllNullKeys
}
}
- map.optimize()
- new LongHashedRelation(numFields, map)
+
+ val reorderMap = reorderFactor.exists(_ * map.numUniqueKeys <=
map.numTotalValues)
+ val finalMap = if (reorderMap) {
+ // reorganize the hash map so that nodes of a given linked list are next
to each other in
+ // memory.
+ logInfo(s"Reordering LongToUnsafeRowMap, numUniqueKeys:
${map.numUniqueKeys}, " +
+ s"numTotalValues: ${map.numTotalValues}")
+ // An exception due to insufficient memory can occur either during
initialization or while
+ // adding rows to the map.
+ // 1. Failure occurs during initialization i.e. in
LongToUnsafeRowMap.init:
+ // release of the partially allocated memory is already taken care of in
the
+ // LongToUnsafeRowMap.ensureAcquireMemory method thus no further action
is required.
+ // 2. Failure occurs while adding rows to the map: the partially
allocated memory
+ // is not cleaned up, thus LongToUnsafeRowMap.free is invoked in the
catch clause.
+ var maybeCompactMap: Option[LongToUnsafeRowMap] = None
+ try {
+ maybeCompactMap = Some(new LongToUnsafeRowMap(taskMemoryManager,
+ Math.toIntExact(map.numUniqueKeys)))
+ val compactMap = maybeCompactMap.get
+ val resultRow = new UnsafeRow(numFields)
+ map.keys().foreach { rowKey =>
+ val key = rowKey.getLong(0)
+ map.get(key, resultRow).foreach { row =>
+ compactMap.append(key, row)
+ }
Review comment:
In case an OOM-related exception is thrown while appending to the map,
we invoke `maybeCompactMap.foreach(_.free())` in the catch clause which
releases the memory of the `compactMap`.
https://github.com/apache/spark/pull/35047/files#diff-127291a0287f790755be5473765ea03eb65f8b58b9ec0760955f124e21e3452fR1171
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]