I was pretty sure this was the answer. Yes it makes sense to me. I was expecting this response. I was hoping for some magic I didn't know about, but not really expecting it.
Thanks, Jon -----Original Message----- From: Josh Elser [mailto:[email protected]] Sent: Wednesday, September 02, 2015 5:13 PM To: [email protected] Subject: Re: Scanning In Timestamp Order Jon, Short answer: no. In RDBMS parlance, Accumulo has a single index. That index is the "row" portion of the Key class. This is the reason you see that as a "standard practice". Any other attempt to fetch data based on another component of the key (ignoring locality groups/column family subtleties) is an exhaustive scan of your dataset. If you are going to support this application for any duration of time, it is a good idea to take the penalty once in rewriting your old data into the new format to make all of your queries henceforth fast. If you have such a significant amount of data that you want to avoid running a large mapreduce task, you'll likely not want to make your users wait to read all of that data to answer every query :) Does that make sense? - Josh Parise, Jonathan wrote: > Hi, > > I was wondering if there is a way to scan a table based on the > timestamps. For example, is there a way to set a range based on the > timestamp portion of the key? > > I know that standard practice is to add a timestamp as part of the row > id, but in this particular case I probably cannot use that technique. > The reason I can't use it is that I need to find the most recent data > in a preexisting Accumulo instance. Not all of the information was > stored with timestamps as appended to the row id. I can't go back and > change the data, I just have to work with what is there. > > So, given a large amount of preexisting data without time information > in the row id, column family or column qualifier, how would you scan > for the most recent data? > > Specifically, is there any way to scan/sort by the timestamp portion > of the key. I did not see any way to make a Range with times. > > I also really do not want to run a job over all the data to make a new > copy of the table that is sorted. I have a lot of data here and such a > replication would take a very long time. > > Thanks, > > Jon >
