I think you are missing the last one because next() calls super.next() at the end AND your has hasTop() calls super.hasTop()
On Tue, Sep 29, 2015 at 3:45 PM, Moises Baly <[email protected]> wrote: > Hi there, > > I'm writing a custom iterator, which essentially is obtaining a range of > values using a slightly different way to compare the rows (for keeping in > range). In one test, it should return every row in Accumulo, but it's > missing the last one. The most important parts of the code would look like > this: > > class CIterator extends WrappingIterator() { > private var emitKey: Key = _ > private var emitValue: Value = _ > > override def deepCopy(env: IteratorEnvironment): > SortedKeyValueIterator[Key, Value] = { > new CIterator(this, env) > } > > > def this(_this: CIterator, env: IteratorEnvironment) = { > this() > setSource(_this.getSource.deepCopy(env)) > } > > override def init(source: SortedKeyValueIterator[Key, Value], options: > util.Map[String, String], env: IteratorEnvironment) = { > super.init(source, options, env) > } > > override def getTopKey(): Key = { > emitKey > } > > override def getTopValue(): Value = { > emitValue > } > > override def hasTop(): Boolean = { > super.hasTop > } > > override def seek(range: Range, columnFamilies: > util.Collection[ByteSequence], inclusive: Boolean): Unit = { > > ... > > val seekRange = new Range(partialKeyStart.toString, true, > partialKeyEnd.toString, true) > > super.seek(seekRange, columnFamilies, inclusive); > > if (super.hasTop()) { > next(); > } > } > > override def next(): Unit = { > ... > val lowerBoundCheck = rangeStart.compareTo(nextKey.getRow.toString) > val upperBoundCheck = rangeEnd.compareTo(nextKey.getRow.toString) > if (lowerBoundCheck <= 0 && upperBoundCheck >= 0){ > emitKey = new Key(nextKey) > emitValue = new Value(nextValue) > if (super.hasTop()){ > super.next() > } > > } > } > } > > > So that code, if I have a range that comprises every row, returns every one > of them but the last one. A high level call list would look like this: > > Seek -> > Next -> > hasTop -> > Top key -> > Top key -> > Top value -> > hasTop -> > Top key -> > Top value -> > Next -> > hasTop -> > Top key -> > Top key -> > Top value -> > (print row - value 1) -> hasTop -> > Top key -> > Top value -> > Next -> > hasTop -> > Top key -> > Top key -> > Top value -> (print row - value 2) -> hasTop -> > Top key -> > Top value -> > Next -> > hasTop -> (print row - value 3) -> hasTop -> > > I think I'm missing something on the call tree: > > 1- Is it normal to have many subsequent topKey() calls after next()? > > 2- This is supposed to give me every row (the condition put in place for the > range is working), but as you can see, it stops after the last next() call, > for some reason (maybe something to do with the interfaces hierarchy?) > > 3- In general, what would be a correct approach (execution path) for building > a custom iterator? I'm still hesitant on how the iterator functions (next, > seek, getTop...) interact with each other, specially in the way we give back > results to clients. > > Thank you for your time, > > > Moises > > > > On Tue, Sep 29, 2015 at 11:16 AM, Keith Turner <[email protected]> wrote: > >> >> >> On Tue, Sep 29, 2015 at 12:59 AM, mohit.kaushik <[email protected] >> > wrote: >> >>> Hi Keith, >>> >>> When we fetch a column or column family Ii seems, it does not seek and >>> only scan by filtering the key/value pairs. But as you said if I design a >>> custom iterator to fetch a column family, It may work faster. >>> >> >> When column families are fetched, Accumulo will seek[1]. It tries to >> read 10 cells and then seeks. >> >> When fetching family and qualifier, two iterators are used. The >> ColumnFamilySkippingIterator and ColumnQualifierFilter. The >> ColumnQualifierFilter does a scan of all qualifers within a family [2]. >> The system configures the qualifier filter to have the family skipping iter >> as a source[3], so it could still seek between families. >> >> >>> >>> But I want to know what would be the scenario if I define a locality >>> group for the column family and run the same custom iterator on it which >>> scan and seeks both? what would be he impact on performance (gain or loss)? >>> >> >> Like Josh said, it really depends on your situation. Its hard to offer an >> opinion w/o knowing more about the schema and the queries. >> >> Below I expanded on what Josh mentioned. >> >> If you have a locality group, it can really help in the case where you >> have many rows that have a few families. For example if you have 10^7 rows >> in a tablet and only 10^3 have a certain column family thats in a locality >> group, it can make it very fast to find those 1000 rows. W/o a locality >> group even w/ seeking, you would still be seeking to each row. >> >> Conversely if you have 10^2 rows in a tablet, each having many families. >> If there is a column family you are interested in that only exist in 10 >> rows, you will still need to seek for each row to find it but ~100 seeks is >> not so bad. >> >> >> >> [1]: >> https://github.com/apache/accumulo/blob/1.6.3/core/src/main/java/org/apache/accumulo/core/iterators/system/ColumnFamilySkippingIterator.java#L65 >> [2]: >> https://github.com/apache/accumulo/blob/1.6.3/core/src/main/java/org/apache/accumulo/core/iterators/system/ColumnQualifierFilter.java#L54 >> [3]: >> https://github.com/apache/accumulo/blob/1.6.3/server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java#L2005 >> >> >>> >>> Thanks >>> Mohit Kaushik >>> >>> >>> On 09/28/2015 10:49 PM, Moises Baly wrote: >>> >>> Hi Keith, >>> >>> No I wasn't aware of that. So I'll move forward with the custom >>> iterator. >>> >>> Thank you for your time, >>> >>> Moises >>> >>> On Mon, Sep 28, 2015 at 12:35 PM, Keith Turner <[email protected]> wrote: >>> >>>> On Mon, Sep 28, 2015 at 12:19 PM, Moises Baly <[email protected]> >>>> wrote: >>>> >>>>> Hi all: >>>>> >>>>> I would like to perform a range scan on a table, tweaking the >>>>> definition of what goes into a particular key range. One way I can think >>>>> of >>>>> is writing a filter on the key, and that would work fine. But I think it >>>>> would be slow compared to a scan / seek custom iterator. How does the >>>>> underlying login works? Does Filter goes through all records, or since is >>>>> sorted follows the same underlying logic as a scan? Would a custom >>>>> iterator >>>>> perform better? >>>>> >>>> >>>> Yes, filter will read all data. Custom iterator that seeks may be >>>> faster. >>>> >>>> Are you aware of the following? >>>> >>>> https://issues.apache.org/jira/browse/ACCUMULO-3961 >>>> https://github.com/apache/accumulo/pull/42 >>>> >>>> >>>>> >>>>> Thank you for your time, >>>>> >>>>> Moises >>>>> >>>> >>>> >>> >>> >>> -- >>> >>> * Mohit Kaushik* >>> Software Engineer >>> A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India >>> *Tel:* +91 (124) 4969352 | *Fax:* +91 (124) 4033553 >>> >>> <http://politicomapper.orkash.com>interactive social intelligence at >>> work... >>> >>> <https://www.facebook.com/Orkash2012> >>> <http://www.linkedin.com/company/orkash-services-private-limited> >>> <https://twitter.com/Orkash> <http://www.orkash.com/blog/> >>> <http://www.orkash.com> >>> <http://www.orkash.com> ... ensuring Assurance in complexity and >>> uncertainty >>> >>> *This message including the attachments, if any, is a confidential >>> business communication. If you are not the intended recipient it may be >>> unlawful for you to read, copy, distribute, disclose or otherwise use the >>> information in this e-mail. If you have received it in error or are not the >>> intended recipient, please destroy it and notify the sender immediately. >>> Thank you * >>> >> >> >
