On Tue, Aug 28, 2012 at 9:51 AM, <[email protected]> wrote: > Billie**** > > ** ** > > Your comment “Users should be aware that this is not an efficient > operation, though.” may help me decide if my current use of a secondary > time index is better then. Right now I maintain a table that has > timestamps as the rowid whose values are the rowid in a metadata table. > Therefore I do one range scan based on the timestamp. Then a second lookup > of the metadata rowid. Is this more efficient? >
It probably depends on what percentage of the data you're bringing back, as compared to the amount you're scanning over (if that's not the whole table). I would hypothesize if you're bringing more than N% of the data back, you might as well just use the TimestampFilter on the main table. If you're bringing a smaller percentage back, it could be better to reduce the amount of the main table you have to scan over by maintaining a secondary time index. I'm not sure what N would be. You should also make sure that the secondary index is actually reducing the amount of the main table you're scanning over, e.g. if each rowid had a full range of timestamps, you could be pulling a list of all rowids back from the index table and not reducing the scan over the main table. Also, the TimestampFilter is not optimized. Filters evaluate each key/value pair to see if it is accepted (in this case, if it is in a timestamp range). If there are a lot of timestamps for each cell (keys that are identical except for timestamp), it would be better to use seeking instead. That would involve writing a new iterator. If there aren't many timestamps for each cell, seeking won't help and the TimestampFilter will be fine. Billie > ** ** > > *From:* Billie Rinaldi [mailto:[email protected]] > *Sent:* Tuesday, August 28, 2012 11:46 > > *To:* [email protected]; [email protected] > *Subject:* Re: TimeSpan Iterator**** > > ** ** > > On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <[email protected]> wrote:**** > > On 08/28/2012 09:26 AM, [email protected] wrote:**** > > Does anyone know of a TimeSpan Iterator that will fetch rows based on > the accumulo timestamp?**** > > ** ** > > We actually wrote our own TimestampRangeIterator and TimestampSetIterator > classes. I don't know if 1.4 has any in the core libraries. It's not very > hard though.**** > > > There's a TimestampFilter in org.apache.accumulo.core.iterators.user in > 1.4. It uses a range of timestamps. Users should be aware that this is > not an efficient operation, though. > > Billie**** >
