> > is it the same instance of the iterator object
No, it is not. On Fri, May 15, 2015 at 2:16 PM, Dave Hardcastle <[email protected]> wrote: > Jim, > > That explains a lot - I knew that the iterator stack could be resumed in > the middle of a range, but didn't realise that it used the last emitted key > to decide where to resume. > > Just so I'm clear, when iterators get stopped and later resumed, is it the > same instance of the iterator object that's restarted (so that I could > store state in there and use that to help the reseek) or is it a new > instance of the iterator that has to be able to resume purely on the basis > of the last emitted key? > > As you say though, it's probably best to stick to modifying values only. > > Thanks very much, > > Dave. > > On 15 May 2015 at 18:55, James Hughes <[email protected]> wrote: > >> Hi Dave, >> >> The big thing to note is that your iterator stack may get stopped and >> torn down for various reasons. As Accumulo recreates the stack, it will >> call 'seek' with the last emitted key in order to resume. >> >> If you are returning keys out of order in an iterator, the 'seek' method >> needs to be able to undo the transformation and call 'seek' appropriately. >> That's not impossible, but it isn't trivial. >> >> In GeoMesa, we did something like that at one point (without having a >> smart 'seek'). I enjoyed two days of debugging trying to figure out why >> medium sized requests would hang. (There was an infinite loop....) From >> that experience, I'd suggest only modifying values. >> >> Cheers, >> >> Jim >> >> >> On Fri, May 15, 2015 at 1:26 PM, Dave Hardcastle < >> [email protected]> wrote: >> >>> Hi, >>> >>> I've always assumed that the last iterator in the stack can make >>> arbitrary changes to keys and values, including not returning the keys in >>> sorted order. I know that SortedKeyValueIterator says that "anything >>> implementing this interface should return keys in sorted order" - but I >>> don't see a good reason that has to be true for the final iterator. This >>> assumption seems to be backed up by the manual which says that "the only >>> safe way to generate additional data in an iterator is to alter the current >>> key-value pair" - it doesn't say that making arbitrary modifications to the >>> rowkey or key is forbidden. >>> >>> I have a situation where I am making a transformation of the rowkey that >>> may not preserve the ordering of the keys. When I scan for individual >>> ranges I get the correct results. When I scan for two ranges using a >>> BatchScanner, I get lots of data back which is not in the ranges I queried >>> for. I am not explicitly checking that I have not gone beyond the range, >>> but that should not be necessary as I am not doing any seeking, only >>> consuming the key-values I receive. >>> >>> So, my main question is whether the last iterator is allowed to not >>> return keys in sorted order? >>> >>> Thanks, >>> >>> Dave. >>> >> >> >
