On Tue, Sep 29, 2015 at 12:59 AM, mohit.kaushik <[email protected]> wrote:
> Hi Keith, > > When we fetch a column or column family Ii seems, it does not seek and > only scan by filtering the key/value pairs. But as you said if I design a > custom iterator to fetch a column family, It may work faster. > When column families are fetched, Accumulo will seek[1]. It tries to read 10 cells and then seeks. When fetching family and qualifier, two iterators are used. The ColumnFamilySkippingIterator and ColumnQualifierFilter. The ColumnQualifierFilter does a scan of all qualifers within a family [2]. The system configures the qualifier filter to have the family skipping iter as a source[3], so it could still seek between families. > > But I want to know what would be the scenario if I define a locality group > for the column family and run the same custom iterator on it which scan and > seeks both? what would be he impact on performance (gain or loss)? > Like Josh said, it really depends on your situation. Its hard to offer an opinion w/o knowing more about the schema and the queries. Below I expanded on what Josh mentioned. If you have a locality group, it can really help in the case where you have many rows that have a few families. For example if you have 10^7 rows in a tablet and only 10^3 have a certain column family thats in a locality group, it can make it very fast to find those 1000 rows. W/o a locality group even w/ seeking, you would still be seeking to each row. Conversely if you have 10^2 rows in a tablet, each having many families. If there is a column family you are interested in that only exist in 10 rows, you will still need to seek for each row to find it but ~100 seeks is not so bad. [1]: https://github.com/apache/accumulo/blob/1.6.3/core/src/main/java/org/apache/accumulo/core/iterators/system/ColumnFamilySkippingIterator.java#L65 [2]: https://github.com/apache/accumulo/blob/1.6.3/core/src/main/java/org/apache/accumulo/core/iterators/system/ColumnQualifierFilter.java#L54 [3]: https://github.com/apache/accumulo/blob/1.6.3/server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java#L2005 > > Thanks > Mohit Kaushik > > > On 09/28/2015 10:49 PM, Moises Baly wrote: > > Hi Keith, > > No I wasn't aware of that. So I'll move forward with the custom iterator. > > Thank you for your time, > > Moises > > On Mon, Sep 28, 2015 at 12:35 PM, Keith Turner <[email protected]> wrote: > >> On Mon, Sep 28, 2015 at 12:19 PM, Moises Baly <[email protected]> >> wrote: >> >>> Hi all: >>> >>> I would like to perform a range scan on a table, tweaking the definition >>> of what goes into a particular key range. One way I can think of is writing >>> a filter on the key, and that would work fine. But I think it would be slow >>> compared to a scan / seek custom iterator. How does the underlying login >>> works? Does Filter goes through all records, or since is sorted follows the >>> same underlying logic as a scan? Would a custom iterator perform better? >>> >> >> Yes, filter will read all data. Custom iterator that seeks may be faster. >> >> Are you aware of the following? >> >> https://issues.apache.org/jira/browse/ACCUMULO-3961 >> https://github.com/apache/accumulo/pull/42 >> >> >>> >>> Thank you for your time, >>> >>> Moises >>> >> >> > > > -- > > * Mohit Kaushik* > Software Engineer > A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India > *Tel:* +91 (124) 4969352 | *Fax:* +91 (124) 4033553 > > <http://politicomapper.orkash.com>interactive social intelligence at > work... > > <https://www.facebook.com/Orkash2012> > <http://www.linkedin.com/company/orkash-services-private-limited> > <https://twitter.com/Orkash> <http://www.orkash.com/blog/> > <http://www.orkash.com> > <http://www.orkash.com> ... ensuring Assurance in complexity and > uncertainty > > *This message including the attachments, if any, is a confidential > business communication. If you are not the intended recipient it may be > unlawful for you to read, copy, distribute, disclose or otherwise use the > information in this e-mail. If you have received it in error or are not the > intended recipient, please destroy it and notify the sender immediately. > Thank you * >
