Thank you for your answer John, you understood perfectly what my use case is.
The possible solutions that you propose came to mind to me, too. This confirms to me that, unfortunately, there is no fancy way to overcome this problem. Is there any good documentation on different query planning for Accumulo that could help with my use case? Thanks. Regards, Max From: Josh Elser <josh.el...@gmail.com> To: user@accumulo.apache.org Date: 09/01/2017 21:55 Subject: Re: is there any "trick" to save the state of an iterator? Hey Max, There is no provided mechanism to do this, and this is a problem with supporting "range queries". I'm hoping I'm understanding your use-case correctly; sorry in advance if I'm going off on a tangent. When performing the standard sort-merge join across some columns to implement intersections and unions, the un-sorted range of values you want to scan over (500k-600k) breaks the ordering of the docIds which you are trying to catch. The trivial solution is to convert a range into a union of discrete values (500000 || 500001 || 500002 || ..) but you can see how this quickly falls apart. An inverted index could be used to enumerate the values that exist in the range. Another trivial solution would be to select all records matching the smaller condition, and then post-filter the other condition. There might be some more trickier query planning decisions you could also experiment with (I'd have to give it lots more thought). In short, I'd recommend against trying to solve the problem via saving state. Architecturally, this is just not something that Accumulo Iterators are designed to support at this time. - Josh Massimilian Mattetti wrote: > Hi all, > > I am working with a Document-Partitioned Index table whose index > sections are accessed using ranges over the indexed properties (e.g. > property A ∈ [500,000 - 600,000], property B ∈ [0.1 - 0.4], etc.). The > iterator that handles this table works by: 1st - calculating (doing > intersection and union on different properties) all the result from the > index section of a single bin; 2nd - using the ids retrieved from the > index, it goes over the data section of the specific bin. > This iterator has proved to have significant performance penalty > whenever the amount of data retrieved from the index is orders of > magnitude bigger than the table_scan_max_memory i.e. the iterator is > teardown tens of times for each bin. Since there is no explicit way to > save the state of an iterator, is there any other mechanism/approach > that I could use/follow in order to avoid to re-calculate the index > result set after each teardown? > Thanks. > > > Regards, > Max > .