Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/7849#discussion_r36036047
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java
---
@@ -225,4 +232,93 @@ public void printPerfMetrics() {
System.out.println("Total memory consumption (bytes): " +
map.getTotalMemoryConsumption());
}
+ /**
+ * Sorts the key, value data in this map in place, and return them as an
iterator.
+ *
+ * The only memory that is allocated is the address/prefix array, 16
bytes per record.
+ */
+ public KVIterator<UnsafeRow, UnsafeRow> sortedIterator() {
+ int numElements = map.numElements();
+ final int numKeyFields = groupingKeySchema.size();
+ TaskMemoryManager memoryManager = map.getTaskMemoryManager();
+
+ UnsafeExternalRowSorter.PrefixComputer prefixComp =
+ SortPrefixUtils.createPrefixGenerator(groupingKeySchema);
+ PrefixComparator prefixComparator =
SortPrefixUtils.getPrefixComparator(groupingKeySchema);
+
+ final BaseOrdering ordering =
GenerateOrdering.create(groupingKeySchema);
+ RecordComparator recordComparator = new RecordComparator() {
+ private final UnsafeRow row1 = new UnsafeRow();
+ private final UnsafeRow row2 = new UnsafeRow();
+
+ @Override
+ public int compare(Object baseObj1, long baseOff1, Object baseObj2,
long baseOff2) {
+ row1.pointTo(baseObj1, baseOff1 + 4, numKeyFields, -1);
--- End diff --
The `-1` here is for `sizeInBytes`; if we needed to, I guess we could
retrieve the size in bytes since we know where it's stored relative to the row
address.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]