cloud-fan commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r470463130
##########
File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
##########
@@ -428,6 +428,68 @@ public MapIterator destructiveIterator() {
return new MapIterator(numValues, new Location(), true);
}
+ /**
+ * Iterator for the entries of this map. This is to first iterate over key
index array
+ * `longArray` then accessing values in `dataPages`. NOTE: this is different
from `MapIterator`
+ * in the sense that key index is preserved here
+ * (See `UnsafeHashedRelation` for example of usage).
+ */
+ public final class MapIteratorWithKeyIndex implements Iterator<Location> {
+
+ /**
+ * The index in `longArray` where the key is stored.
+ */
+ private int keyIndex = 0;
+
+ private int numRecords;
+ private final Location loc;
+
+ private MapIteratorWithKeyIndex() {
+ this.numRecords = numValues;
+ this.loc = new Location();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return numRecords > 0;
+ }
+
+ @Override
+ public Location next() {
+ if (!loc.isDefined() || !loc.nextValue()) {
+ while (longArray.get(keyIndex * 2) == 0) {
+ keyIndex++;
+ }
+ loc.with(keyIndex, 0, true);
+ keyIndex++;
+ }
+ numRecords--;
+ return loc;
+ }
+ }
+
+ /**
+ * Returns an iterator for iterating over the entries of this map,
+ * by first iterating over the key index inside hash map's `longArray`.
+ *
+ * For efficiency, all calls to `next()` will return the same {@link
Location} object.
+ *
+ * The returned iterator is NOT thread-safe. If the map is modified while
iterating over it,
+ * the behavior of the returned iterator is undefined.
+ */
+ public MapIteratorWithKeyIndex iteratorWithKeyIndex() {
+ return new MapIteratorWithKeyIndex();
+ }
+
+ /**
+ * The maximum number of allowed keys index.
+ *
+ * The value of allowed keys index is in the range of [0, maxNumKeysIndex -
1].
+ */
+ public int maxNumKeysIndex() {
Review comment:
maybe `maxAllowedKeyIndex` is more accurate.
e.g. it's possible that the max key index is 90 while the `longArray` can
hold 100 keys. So the `maxAllowedKeyIndex` is 100 not 90.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]