David, The core challenge here is to be able to continue scans under failure conditions. There are several places where we tear down the iterator tree and rebuild it, including when tablet servers die, when we need to free resources to support concurrency, and a few others. In order to continue a scan where we left off, we need to be able to point to some place in the stream of key/value pairs. If we want to be robust against tablet server failure we can't just store that scan session information on the tablet server, so we use the last key that was returned to the client for that information.
When you add an iterator that transforms keys, you change the meaning of that pointer so that it points into the transformed stream instead of the underlying stream. There are two requirements in order to do this sanely: 1. Your iterator should not change the row portion of the key, although it can change any of the remaining parts. 2. Your iterator's seek method should perform the reverse transformation on the range when it seeks the underlying iterator(s). This will ensure that you don't skip ranges of underlying keys when the scanner continues the scan. That said, I agree that the behavior of the scanner where it ignores keys that are returned is probably not optimal, and I'm not sure why it does that, except maybe to prevent some infinite loops. Adam On Tue, Jan 22, 2013 at 11:55 AM, Slater, David M. <david.sla...@jhuapl.edu>wrote: > In designing some of my own custom iterators, I was noticing some > interesting behavior. Note: my iterator does not return the original key, > but instead returns a computed value that is not necessarily in > lexicographic order.**** > > ** ** > > So far as I can tell, when the Scanner switches between tablets, it checks > the key that is returned in the new tablet and compares it (I think it > compares key.row()) with the last key from the previous tablet. If the new > key is greater than the previous one, then it proceeds normally. If, > however, the new key is less than or equal to the previous key, then the > Scanner does not return the value. It does, however, continue to iterate > through the tablet, continuing to compare until it finds a key greater than > the last one. Once it finds one, however, it progresses through the rest of > that tablet without doing a check. (It implicitly assumes that everything > in a tablet will be correctly ordered). **** > > ** ** > > Now if I was to return the original key, it would work fine (since it > would always be in order), but that also limits the functionality of my > custom iterator. **** > > ** ** > > My primary question is: why would it be designed this way? When switching > between tablets, are there potential problems that might crop up if this > check isn’t done?**** > > ** ** > > Thanks, > David**** >