agrawaldevesh commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r470248197
##########
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##########
@@ -428,6 +428,62 @@ public MapIterator destructiveIterator() {
return new MapIterator(numValues, new Location(), true);
}
+ /**
+ * Iterator for the entries of this map. This is to first iterate over key
index array
+ * `longArray` then accessing values in `dataPages`. NOTE: this is different
from `MapIterator`
+ * in the sense that key index is preserved here
+ * (See `UnsafeHashedRelation` for example of usage).
+ */
+ public final class MapIteratorWithKeyIndex implements Iterator<Location> {
+
+ private int keyIndex = 0;
+ private int numRecords;
+ private final Location loc;
+
+ private MapIteratorWithKeyIndex(int numRecords, Location loc) {
+ this.numRecords = numRecords;
+ this.loc = loc;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return numRecords > 0;
+ }
+
+ @Override
+ public Location next() {
+ if (!loc.isDefined() || !loc.nextValue()) {
+ while (longArray.get(keyIndex * 2) == 0) {
+ keyIndex++;
+ }
+ loc.with(keyIndex, (int) longArray.get(keyIndex * 2 + 1), true);
+ keyIndex++;
Review comment:
I don't have strong preferences for checking of `keyIndex`. I was more
referring to making sure numRecords <= numValues. I think if we guarantee that,
then keyIndex shouldn't grow beyond longArray.size().
I also think that the bound check may not be relatively expensive compared
to the `taskMemoryManager.getPage(fullKeyAddress)` call buried inside of
Location.with. That should be pretty memory bound.
##########
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##########
@@ -428,6 +428,62 @@ public MapIterator destructiveIterator() {
return new MapIterator(numValues, new Location(), true);
}
+ /**
+ * Iterator for the entries of this map. This is to first iterate over key
index array
+ * `longArray` then accessing values in `dataPages`. NOTE: this is different
from `MapIterator`
+ * in the sense that key index is preserved here
+ * (See `UnsafeHashedRelation` for example of usage).
+ */
+ public final class MapIteratorWithKeyIndex implements Iterator<Location> {
+
+ private int keyIndex = 0;
Review comment:
SGTM
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]