On Wed, Apr 15, 2015 at 10:20 AM, Keith Turner <[email protected]> wrote: > > > Random thought on revamp. Immutable key values with enough primitives to > make most operations efficient (avoid constant alloc/copy) might be > something to consider for the iterator API > > So, is this a tradeoff in the performance vs. inter-iterator isolation space? From a performance perspective we would do best if we just passed around pointers to an underlying byte array (e.g. ByteBuffer-style), but maximum isolation would require never reusing anything returned from an iterator's getTopX methods. From a security perspective we need to be careful with how we reuse data objects (hence the need for the SynchronizedIterator at the top of the "system" iterators), but I would say we can probably relax other isolation concerns in the iterators in favor of performance.
I think there's probably a bigger project here around minimizing the object creation, data copying, serialization, and deserialization of keys. We did some work that Chris McCubbin will be presenting at the upcoming accumulo summit around pushing key comparisons down to a serialized form of the key, and that made a huge impact on load performance. I think we could probably achieve an order of magnitude more throughput in the iterator tree with a major refactoring. Any thoughts on when we might have the appetite for such a change? If we're thinking about making key/values immutable then we might piggyback a bigger redesign on that already breaking change. Adam
