Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I
mean, bad I did not figured that. Thanks for pointing that. That
definitively explain the difference in the performances.
I have activated the bloomfilters with this code:
HBaseAdmin admin = new HBaseAdmin(config);
HTable table = new HTable(config, "test3");
System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0];
cd.setBloomFilterType(BloomType.ROW);
admin.disableTable("test3");
admin.modifyColumn("test3", cd);
admin.enableTable("test3");
System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
And here is the result for the first attempt (using gets):
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
'true', BLOCKCACHE => 'true'}
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
'true', BLOCKCACHE => 'true'}
Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0...
Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds)
2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds)
3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds)
After few more iterations (about 30), I'm between 200 and 250
lines/seconds, like before.
Regarding the filterList, I tried, but now I'm getting this error from
the servers:
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-6376193724680783311' does not exist
Here is the code:
final int linesToRead = 10000;
System.out.println(new java.util.Date () + " Processing iteration " +
iteration + "... ");
RandomRowFilter rrf = new RandomRowFilter();
KeyOnlyFilter kof = new KeyOnlyFilter();
Scan scan = new Scan();
List<Filter> filters = new ArrayList<Filter>();
filters.add(rrf);
filters.add(kof);
FilterList filterList = new FilterList(filters);
scan.setFilter(filterList);
scan.setBatch(Math.min(linesToRead, 1000));
scan.setCaching(Math.min(linesToRead, 1000));
ResultScanner scanner = table.getScanner(scan);
processed = 0;
long timeBefore = System.currentTimeMillis();
for (Result result : scanner.next(linesToRead))
{
System.out.println("Result: " + result); //
if (result != null)
processed++;
}
scanner.close();
It's failing when I try to do for (Result result :
scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and
1 with the same result :(
I will try to find the root cause, but if you have any hint, it's welcome.
JM