Jon, There is some magic, but unfortunately it's not yet implemented: ACCUMULO-652
Want to take over that project? Adam On Wed, Sep 2, 2015 at 5:14 PM, Parise, Jonathan <[email protected]> wrote: > I was pretty sure this was the answer. > > Yes it makes sense to me. I was expecting this response. I was hoping for > some magic I didn't know about, but not really expecting it. > > Thanks, > > Jon > > -----Original Message----- > From: Josh Elser [mailto:[email protected]] > Sent: Wednesday, September 02, 2015 5:13 PM > To: [email protected] > Subject: Re: Scanning In Timestamp Order > > Jon, > > Short answer: no. > > In RDBMS parlance, Accumulo has a single index. That index is the "row" > portion of the Key class. This is the reason you see that as a "standard > practice". Any other attempt to fetch data based on another component of > the key (ignoring locality groups/column family subtleties) is an > exhaustive scan of your dataset. > > If you are going to support this application for any duration of time, it > is a good idea to take the penalty once in rewriting your old data into the > new format to make all of your queries henceforth fast. If you have such a > significant amount of data that you want to avoid running a large mapreduce > task, you'll likely not want to make your users wait to read all of that > data to answer every query :) > > Does that make sense? > > - Josh > > Parise, Jonathan wrote: > > Hi, > > > > I was wondering if there is a way to scan a table based on the > > timestamps. For example, is there a way to set a range based on the > > timestamp portion of the key? > > > > I know that standard practice is to add a timestamp as part of the row > > id, but in this particular case I probably cannot use that technique. > > The reason I can't use it is that I need to find the most recent data > > in a preexisting Accumulo instance. Not all of the information was > > stored with timestamps as appended to the row id. I can't go back and > > change the data, I just have to work with what is there. > > > > So, given a large amount of preexisting data without time information > > in the row id, column family or column qualifier, how would you scan > > for the most recent data? > > > > Specifically, is there any way to scan/sort by the timestamp portion > > of the key. I did not see any way to make a Range with times. > > > > I also really do not want to run a job over all the data to make a new > > copy of the table that is sorted. I have a lot of data here and such a > > replication would take a very long time. > > > > Thanks, > > > > Jon > > >
