Re: Scanning In Timestamp Order

Adam Fuchs Wed, 02 Sep 2015 14:16:55 -0700

Jon,

There is some magic, but unfortunately it's not yet implemented:
ACCUMULO-652


Want to take over that project?

Adam

On Wed, Sep 2, 2015 at 5:14 PM, Parise, Jonathan <[email protected]>
wrote:

> I was pretty sure this was the answer.
>
> Yes it makes sense to me. I was expecting this response. I was hoping for
> some magic I didn't know about, but not really expecting it.
>
> Thanks,
>
> Jon
>
> -----Original Message-----
> From: Josh Elser [mailto:[email protected]]
> Sent: Wednesday, September 02, 2015 5:13 PM
> To: [email protected]
> Subject: Re: Scanning In Timestamp Order
>
> Jon,
>
> Short answer: no.
>
> In RDBMS parlance, Accumulo has a single index. That index is the "row"
> portion of the Key class. This is the reason you see that as a "standard
> practice". Any other attempt to fetch data based on another component of
> the key (ignoring locality groups/column family subtleties) is an
> exhaustive scan of your dataset.
>
> If you are going to support this application for any duration of time, it
> is a good idea to take the penalty once in rewriting your old data into the
> new format to make all of your queries henceforth fast. If you have such a
> significant amount of data that you want to avoid running a large mapreduce
> task, you'll likely not want to make your users wait to read all of that
> data to answer every query :)
>
> Does that make sense?
>
> - Josh
>
> Parise, Jonathan wrote:
> > Hi,
> >
> > I was wondering if there is a way to scan a table based on the
> > timestamps. For example, is there a way to set a range based on the
> > timestamp portion of the key?
> >
> > I know that standard practice is to add a timestamp as part of the row
> > id, but in this particular case I probably cannot use that technique.
> > The reason I can't use it is that I need to find the most recent data
> > in a preexisting Accumulo instance. Not all of the information was
> > stored with timestamps as appended to the row id. I can't go back and
> > change the data, I just have to work with what is there.
> >
> > So, given a large amount of preexisting data without time information
> > in the row id, column family or column qualifier, how would you scan
> > for the most recent data?
> >
> > Specifically, is there any way to scan/sort by the timestamp portion
> > of the key. I did not see any way to make a Range with times.
> >
> > I also really do not want to run a job over all the data to make a new
> > copy of the table that is sorted. I have a lot of data here and such a
> > replication would take a very long time.
> >
> > Thanks,
> >
> > Jon
> >
>

Re: Scanning In Timestamp Order

Reply via email to