RE: Scanning In Timestamp Order

Parise, Jonathan Wed, 02 Sep 2015 14:15:06 -0700

I was pretty sure this was the answer.

Yes it makes sense to me. I was expecting this response. I was hoping for some 
magic I didn't know about, but not really expecting it.

Thanks,

Jon

-----Original Message-----
From: Josh Elser [mailto:[email protected]] 
Sent: Wednesday, September 02, 2015 5:13 PM
To: [email protected]
Subject: Re: Scanning In Timestamp Order

Jon,

Short answer: no.

In RDBMS parlance, Accumulo has a single index. That index is the "row" 
portion of the Key class. This is the reason you see that as a "standard 
practice". Any other attempt to fetch data based on another component of the 
key (ignoring locality groups/column family subtleties) is an exhaustive scan 
of your dataset.

If you are going to support this application for any duration of time, it is a 
good idea to take the penalty once in rewriting your old data into the new 
format to make all of your queries henceforth fast. If you have such a 
significant amount of data that you want to avoid running a large mapreduce 
task, you'll likely not want to make your users wait to read all of that data 
to answer every query :)

Does that make sense?

- Josh

Parise, Jonathan wrote:
> Hi,
>
> I was wondering if there is a way to scan a table based on the 
> timestamps. For example, is there a way to set a range based on the 
> timestamp portion of the key?
>
> I know that standard practice is to add a timestamp as part of the row 
> id, but in this particular case I probably cannot use that technique.
> The reason I can't use it is that I need to find the most recent data 
> in a preexisting Accumulo instance. Not all of the information was 
> stored with timestamps as appended to the row id. I can't go back and 
> change the data, I just have to work with what is there.
>
> So, given a large amount of preexisting data without time information 
> in the row id, column family or column qualifier, how would you scan 
> for the most recent data?
>
> Specifically, is there any way to scan/sort by the timestamp portion 
> of the key. I did not see any way to make a Range with times.
>
> I also really do not want to run a job over all the data to make a new 
> copy of the table that is sorted. I have a lot of data here and such a 
> replication would take a very long time.
>
> Thanks,
>
> Jon
>

RE: Scanning In Timestamp Order

Reply via email to