Improving Batchscanner Performance

Slater, David M. Tue, 20 May 2014 10:18:51 -0700

Hey everyone,

I'm trying to improve the query performance of batchscans on my data table. I 
first scan over index tables, which returns a set of rowIDs that correspond to 
the records I am interested in. This set of records is fairly randomly (and 
uniformly) distributed across a large number of tablets, due to the randomness 
of the UID and the query itself. Then I want to scan over my data table, which 
is setup as follows:
row                     colFam          colQual         value
rowUID           --                     --                      byte[] of data


These records are fairly small (100s of bytes), but numerous (I may return 
50000 or more). The method I use to obtain this follows. Essentially, I turn 
the rows returned from the first query into a set of ranges to input into the 
batchscanner, and then return those rows, retrieving the value from them. 

// returns the data associated with the given collection of rows
    public Collection<byte[]> getRowData(Collection<Text> rows, Text dataType, 
String tablename, int queryThreads) throws TableNotFoundException {
        List<byte[]> values = new ArrayList<byte[]>(rows.size());
        if (!rows.isEmpty()) {
            BatchScanner scanner = conn.createBatchScanner(tablename, new 
Authorizations(), queryThreads);
            List<Range> ranges = new ArrayList<Range>();
            for (Text row : rows) {
                ranges.add(new Range(row));
            }
            scanner.setRanges(ranges);
            for (Map.Entry<Key, Value> entry : scanner) {
                values.add(entry.getValue().get());
            }
            scanner.close();
        }
        return values;
    }

Is there a more efficient way to do this? I have index caches and bloom filters 
enabled (data caches are not), but I still seem to have a long query lag. Any 
thoughts on how I can improve this?

Thanks,
David

Improving Batchscanner Performance

Reply via email to