Hi, folks, The StatsCombiner[1] shows one way for an Iterator to distinguish between processed and unprocessed data. In this case, the StatsCombiner treats string representations of integers as unprocessed data and comma-separated string representations of integers as processed data.
Two questions: First, is it possible to do this in an arbitrary fashion? For example, let's say my Iterator adds Values to a bloom filter which it maintains internally - like a combiner, but potentially across multiple CF's. If the iterator encounters unprocessed data, it should offer it to the bloom filter. If it encounters processed data (ie. a bloom filter), it should merge it with its own bloom filter. The only way that I can think of to do this is to have a higher-priority iterator that "escapes" Values, and have my Iterator emit unescaped Values. Then my iterator can make decisions based on whether a current Value is or isn't escaped. I find this approach pretty kludgy though, and any advice is welcome. Second question: the need to distinguish between processed and unprocessed data, is this due to the Iterator running in all three scopes? Would a per-scanner Iterator or an Iterator running in scan scope be guaranteed to only see unprocessed data? Thanks, -Russ 1: https://github.com/apache/accumulo/blob/master/examples/simple/src/main/java/org/apache/accumulo/examples/simple/combiner/StatsCombiner.java