Your first scan is set to start with an offset in the table and has a
stop row, whereas the other one is a full table scan with filtering
(if my understanding is correct). Compare the second scan with a
straight up full table scan and you should see where the slowdown
comes from (should be from the fact that it has to read everything).

Hope that helps,

J-D

On Thu, Aug 4, 2011 at 3:57 AM, Steinmaurer Thomas
<[email protected]> wrote:
> Hi Andy and Ted!
>
> Thanks for your reply. Basically, I'm currently trying a range scan and a 
> regex row filter on a very small table (~ 115K rows), just to get used to. 
> Hadoop/HBase ... is running in the available Cloudera VM.
>
> I have the following row key, as already discussed in other threads.
>
> vehicle_id: up to 16 characters
> device_id: up to 16 characters
> timestamp: YYYYMMDDhhmmss
>
> Pretty much one row every 5 minutes for a particular vehicle and device.
>
> Now I want to get the rows for an entire day for a particular vehicle and 
> device.
>
> The following range scan implementation:
>
>        Scan scan = new Scan();
>
>        String startKey =
>                String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
> "57").replace(' ', '0') // Vehicle ID
>                + "-"
>                + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
> "1").replace(' ', '0') // Device ID
>                + "-"
>                + "20110808000000";
>        String endKey =
>                String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
> "57").replace(' ', '0') // Vehicle ID
>                + "-"
>                + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
> "1").replace(' ', '0') // Device ID
>                + "-"
>                + "20110808235959";
>        scan.setStartRow(Bytes.toBytes(startKey));
>        scan.setStopRow(Bytes.toBytes(endKey));
>        scan.addColumn(Bytes.toBytes("data_details"), 
> Bytes.toBytes("temperature1_value"));
>
> Takes < 1 sec.
>
> Whereas the following regex based row filter implementation:
>
>        List<Filter> filters = new ArrayList<Filter>();
>        RowFilter rf = new RowFilter(
>                CompareFilter.CompareOp.EQUAL
>                , new RegexStringComparator(".{14}57\\-.{15}1\\-20110808.{6}")
>        );
>        filters.add(rf);
>
>        QualifierFilter qf = new QualifierFilter(
>                CompareFilter.CompareOp.EQUAL
>                , new RegexStringComparator("temperature1_value")
>        );
>        filters.add(qf);
>
>        FilterList filterList1 = new FilterList(filters);
>        scan.setFilter(filterList1);
>
>
> Takes around 6 sec on a very small table.
>
>
> We aren't sure if we need the regex row filter capabilities at all or if 
> range scans are sufficient for our access pattern. But a better understanding 
> on how to optimize regex stuff would be helpful.
>
>
> Thanks!
>
> Thomas
>
>
> -----Original Message-----
> From: Andrew Purtell [mailto:[email protected]]
> Sent: Mittwoch, 27. Juli 2011 08:25
> To: [email protected]
> Subject: Re: Something like Execution Plan as in the RDBMS world?
>
>> Or is this a complete different thinking?
>
> Yes.
>
> There isn't an "execution plan" when using HBase, as that term is commonly 
> understood from RDBMS systems. The commands you issue against HBase using the 
> client API are executed in order as you issue them.
>
>> Depending on the access pattern, we might be in a situation to use
>>e.g. RegEx filters on rowkeys. I wonder if there is some kind of an
>>execution plan when running a HBase query to better understand
>
> Exposing filter statistics (hit/skip ratio etc.) and other per-query metrics 
> like number of store files read, how many keys examined, etc. is an 
> interesting idea perhaps along the lines of what you ask, but HBase does not 
> have support for that level of query performance introspection at the moment.
>
> What people do is measure the application metrics of interest and try 
> different approaches to optimize them.
>
> Best regards,
>
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
> Tom White)
>
>
>>________________________________
>>From: Steinmaurer Thomas <[email protected]>
>>To: [email protected]
>>Sent: Tuesday, July 26, 2011 11:10 PM
>>Subject: Something like Execution Plan as in the RDBMS world?
>>
>>Hello,
>>
>>
>>
>>we have a three part row-key taking into account that the first part is
>>important for distribution/partitioning when the system grows.
>>Depending on the access pattern, we might be in a situation to use e.g.
>>RegEx filters on rowkeys. I wonder if there is some kind of an
>>execution plan (as known in RDBMS) when running a HBase query to better
>>understand how HBase processes the query and what execution path it
>>takes to generate the result set.
>>
>>
>>
>>Or is this a complete different thinking?
>>
>>
>>
>>Thanks,
>>
>>Thomas
>>
>>
>>
>>
>>
>>
>

Reply via email to