Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/18543#discussion_r127763486
--- Diff:
sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java
---
@@ -211,7 +211,10 @@ public int compare(Object baseObj1, long baseOff1,
Object baseObj2, long baseOff
// TODO: Why are the sizes -1?
row1.pointTo(baseObj1, baseOff1, -1);
row2.pointTo(baseObj2, baseOff2, -1);
- return ordering.compare(row1, row2);
+ int comparison = ordering.compare(row1, row2);
+ row1.pointTo(null, 0L, -1);
+ row2.pointTo(null, 0L, -1);
--- End diff --
@cloud-fan @srowen It is good idea to do this cleanup once at the end. I am
curious how to implement this cleanup.
While @srowen proposed to use `nsafeExternalSorter.cleanupResources` and
`UnsafeInMemorySorter.free` that will be called when a task is finished, to do
cleanup here does not seem to work in this case. This is because [this
issue](https://issues.apache.org/jira/browse/SPARK-21319) occurs before
completing a task since `UnsafeExternalSorter` instance is registered into the
task `taskContext` at
[here](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L159-L161).
This cleanup approach will not be performed before an OOM occurs during
execution of the task.
IIUC, the end of sort is
[here](https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L345).
This line calls [this sort
method](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Sorter.scala#L36).
Either to do the cleanup at the first part or to do the cleanup after checking
type of a given comparator at the second part could work.
What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]