Thanks to all! I like this solution. I confirmed what you said and will use scanner.setBatchSize() as appropriate. -- D. Lam
On Fri, Jun 29, 2012 at 2:14 PM, John Vines <[email protected]> wrote: > If you set the end to null, it will go until the end of the table. > > Scanners will bring back batches, default is 1000 key-value pairs. If you > know you're only looking for a specifc number of Keys, you can drop the > batch size to match you needs better. But if you end up grabbing multiple > smaller batches, your performance time will be overcome with network > overhead costs. > > John > > On Fri, Jun 29, 2012 at 3:02 PM, Lam <[email protected]> wrote: >> >> This sounds like a good idea. But how do I scan forward -- do I set >> end=null in the following code? >> >> >> Scanner scan=conn.createScanner(tableName, auths); >> >> Text start=new >> Text(Value.longToBytes(beginTimestamp)); >> Text end=new Text(Value.longToBytes(endTimestamp); >> scan.setRange(new Range(start, true, end, false)); >> >> for(Entry<Key,Value> e:scan) ... >> >> >> And is it efficient? i.e., the scanner won't move to the next entry >> until the next iteration through the for loop, right? >> >> I'll run a test right now. >> >> -- >> D. Lam >> >> >> On Fri, Jun 29, 2012 at 1:52 PM, Adam Fuchs <[email protected]> wrote: >> > You can't scan backwards in Accumulo, but you probably don't need to. >> > What >> > you can do instead is use the last timestamp in the range as the key >> > like >> > this: >> > >> > key=2 value= {a.1 b.1 c.2 d.2} >> > key=5 value= {m.3 n.4 o.5} >> > key=7 value={x.6 y.6 z.7} >> > >> > As long as your ranges are non-overlapping, you can just stop when you >> > get >> > to the first key/value pair that starts after your given time range. If >> > your >> > ranges are overlapping then you will have to do a more complicated >> > intersection between forward and reverse orderings to efficiently select >> > ranges, or maybe use some type of hierarchical range intersection index >> > akin >> > to a binary space partitioning tree. >> > >> > Cheers, >> > Adam >> > >> > >> > >> > On Fri, Jun 29, 2012 at 2:19 PM, Lam <[email protected]> wrote: >> >> >> >> I'm using a timestamp as a key and the value is all the relevant data >> >> starting at that timestamp up to the timestamp represented by the key >> >> of the next row. >> >> >> >> When querying, I'm given a time span, consisting of a start and stop >> >> time. I want to return all the relevant data within the time span, so >> >> I was to retrieve the appropriate rows (then filter the data for the >> >> given timespan). >> >> >> >> Example: >> >> In Accumulo: (the format of the value is <letter>.<timestamp>) >> >> key=1 value= {a.1 b.1 c.2 d.2} >> >> key=3 value= {m.3 n.4 o.5} >> >> key=6 value={x.6 y.6 z.7} >> >> >> >> Query: timespan=[2 4] (get all data from timestamp 2 to 4 >> >> inclusively) >> >> >> >> Desire result: retrieve key=1 and key=3, then filter out a.1, b.1, and >> >> o.5, and return the rest >> >> >> >> Problem: How do I know to retrieve key=1 and key=3 without scanning >> >> all the keys? >> >> >> >> Can I create a scanner that looks for the given start key=2 and go to >> >> the prior row (i.e. key=1)? >> >> >> >> -- >> >> D. Lam >> > >> > > >
