Re: Scan vs Put vs Get

N Keywal Thu, 28 Jun 2012 09:25:56 -0700

For the filter list my guess is that you're filtering out all rows
because RandomRowFilter#chance is not initialized (it should be
something like RandomRowFilter rrf = new RandomRowFilter(0.5);)
But note that this test will never be comparable to the test with a
list of gets. You can make it as slow/fast as you want by playing with
the 'chance' parameter.


The results with gets and bloom filter are also in the interesting
category, hopefully an expert will get in the loop...



On Thu, Jun 28, 2012 at 6:04 PM, Jean-Marc Spaggiari
<[email protected]> wrote:
> Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I
> mean, bad I did not figured that. Thanks for pointing that. That
> definitively explain the difference in the performances.
>
> I have activated the bloomfilters with this code:
> HBaseAdmin admin = new HBaseAdmin(config);
> HTable table = new HTable(config, "test3");
> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
> HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0];
> cd.setBloomFilterType(BloomType.ROW);
> admin.disableTable("test3");
> admin.modifyColumn("test3", cd);
> admin.enableTable("test3");
> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
>
> And here is the result for the first attempt (using gets):
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
> 'true', BLOCKCACHE => 'true'}
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
> 'true', BLOCKCACHE => 'true'}
> Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0...
> Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds)
>
> 2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds)
> 3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds)
> After few more iterations (about 30), I'm between 200 and 250
> lines/seconds, like before.
>
> Regarding the filterList, I tried, but now I'm getting this error from
> the servers:
> org.apache.hadoop.hbase.regionserver.LeaseException:
> org.apache.hadoop.hbase.regionserver.LeaseException: lease
> '-6376193724680783311' does not exist
> Here is the code:
>        final int linesToRead = 10000;
>        System.out.println(new java.util.Date () + " Processing iteration " +
> iteration + "... ");
>        RandomRowFilter rrf = new RandomRowFilter();
>        KeyOnlyFilter kof = new KeyOnlyFilter();
>        Scan scan = new Scan();
>        List<Filter> filters = new ArrayList<Filter>();
>        filters.add(rrf);
>        filters.add(kof);
>        FilterList filterList = new FilterList(filters);
>        scan.setFilter(filterList);
>        scan.setBatch(Math.min(linesToRead, 1000));
>        scan.setCaching(Math.min(linesToRead, 1000));
>        ResultScanner scanner = table.getScanner(scan);
>        processed = 0;
>        long timeBefore = System.currentTimeMillis();
>        for (Result result : scanner.next(linesToRead))
>        {
>                System.out.println("Result: " + result); //
>                if (result != null)
>                        processed++;
>        }
>        scanner.close();
>
> It's failing when I try to do for (Result result :
> scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and
> 1 with the same result :(
>
> I will try to find the root cause, but if you have any hint, it's welcome.
>
> JM

Re: Scan vs Put vs Get

Reply via email to